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Preface 


In a lecture at the University of Lille on December 7, 1854, Louis Pasteur noted 
that in the held of observational science, “le hasard ne favorise que les esprits pre¬ 
pares” (“fortune favors the prepared mind”). We would suggest that this statement is 
equally true in the held of aviation. However, reversing the statement might make it 
even more cogent because, for aviators, misfortune punishes the unprepared mind. 
To survive in the demanding domain of aviation, one needs to approach the task fully 
prepared. This means having a comprehensive knowledge not only of weather, aero¬ 
dynamics, propulsion, navigation, and all the other technical disciplines, but also of 
what is simultaneously the most fragile and most resilient, the most unreliable and 
the most adaptable component: the human being. The study of aviation psychology 
can provide some of that knowledge and offer better preparation for the demands that 
a life or an hour in aviation will make. 

This book is about applied psychology. Specifically, it is concerned with the 
application of psychological principles and techniques to the specific situations and 
problems of aviation. The book is meant to inform the student of psychology about 
how the discipline is applied to aviation; even more, it is meant to inform the student 
of aviation about how psychology can be used to address his or her concerns. We 
attempt to maintain this balance of perspectives and needs throughout the book; 
however, when we slip, we do so in favor of the student of aviation. Many books have 
been written by psychologists for psychologists. Few, if any, books have been writ¬ 
ten by psychologists for pilots. It is to this neglected segment that we offer the main 
thrust of this work. 

The overall goal of the book is to make pilots aware of the benefits of psychology 
and its application for improving aviation operations, as well as to provide specific 
information that pilots can use in their daily operations. In addition to making pilots 
aware of the benefits of psychology, the book should also make pilots informed con¬ 
sumers of psychological research and studies so that they may better evaluate and 
implement future products in the held of aviation psychology. 

We would like to express our gratitude to colleagues and students who have 
read parts of the book or the complete manuscript and provided us with valuable 
feedback. In particular, we wish to mention Dr. Kjell Mjps and military psychol¬ 
ogist Live Almas-Sprensen. The book was first published in Norwegian with the 
title Luftfartspsykologi (2008) by the publisher Fagbokforlaget. Special thanks are 
offered to Martin Rydningen for helping us with the translation process, both from 
English to Norwegian and from Norwegian to English. 


Monica Martinussen 
David Hunter 
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Introduction 


1.1 WHAT IS AVIATION PSYCHOLOGY? 

Because the primary target of this book is the student of aviation, rather than the 
student of psychology, it seems prudent to begin with a few definitions. This will 
set some bounds for our discussions and for the reader’s expectations. The title of 
the book includes two key terms: “aviation psychology” and “human factors.” We 
included both these terms because they are often used interchangeably, although that 
is a disservice to both disciplines. Although we will touch on some of the traditional 
areas of human factors in the chapter on the design of aviation systems, our primary 
focus is on aviation psychology. Therefore, we will dwell at some length on what we 
mean by that particular term. 

Psychology* is commonly defined as the science of behavior and mental processes 
of humans, although the behavior of animals is also frequently studied—usually 
as a means to understand human behavior better. Within this broad area, there are 
numerous specialties. The American Psychological Association (APA), the largest 
professional organization of psychologists, lists over 50 divisions, each representing a 
separate aspect of psychology. These include several divisions concerned with various 
aspects of clinical psychology along with divisions concerned with such diverse issues 
as consumer behavior, school psychology, rehabilitation, the military, and addiction. 
All of these are concerned with understanding how human behavior and mental pro¬ 
cesses influence or are influenced by the issues of their particular domain. 

Clearly, psychology covers a very broad area: Literally, any behavior or thought is 
potential grist for the psychologist’s mill. To understand exactly what this book will 
cover, let us consider what we mean by aviation psychology. Undoubtedly, students of 
aviation will know what the first part of the term means, but what is included under 
psychology, and why do we feel justified, even compelled, to distinguish between 
aviation psychology and the rest of the psychological world? 

First, let us immediately dismiss the popular image of psychology. We do not 
include in our considerations of aviation psychology reclining on a couch recount¬ 
ing our childhood and the vicissitudes of our emotional development. That popular 
image of psychology belongs more to the area of clinical psychology, or perhaps even 
psychoanalysis. Although clinical psychology is a major component of the larger 
field of psychology, it has little relevance to aviation psychology. That is not to say 
that pilots and others involved in aviation are not subject to the same mental foibles 
and afflictions that beset the rest of humanity. Neither would we suggest that aspects 


The term “psychology” is derived from the Greek word psyche, meaning both butterfly and soul. 
Psychology was first used as part of a course title for lectures given in the sixteenth century by Philip 
Melanchton. 
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of the human psyche usually addressed in a clinical setting could have no influence 
on human performance in an aviation setting. Quite the opposite, we assert that all 
aspects of the mental functioning of pilots, maintenance personnel, air traffic control¬ 
lers, and the supporting cadre inescapably influence behavior for better or worse. 

Rather, we wish to dissociate aviation psychology from the psychotherapeutic 
focus of traditional clinical psychology. Aviation psychology may concern itself with 
the degree of maladaptive behavior evidenced by excessive drinking or with the 
confused ideation associated with personality disorders. However, it does so for the 
purpose of understanding and predicting the effects of those disorders and behaviors 
on aviation-related activities, rather than for the purpose of effecting a cure. 

Ours is a much more basic approach. We are concerned not only with the behavior 
(what people do) and ideation (what people think) of those with various mental distur¬ 
bances, but also with how people in general behave. Psychology at its most inclusive 
level is the study of the behavior of all people. Psychology asks why, under certain 
conditions, people behave in a certain way, and under different conditions they behave 
in a different way. How do prior events, internal cognitive structures, skills, knowl¬ 
edge, abilities, preferences, attitudes, perceptions, and a host of other psychological 
constructs (see the later discussion of constructs and models) influence behavior? 
Psychology asks these questions, and psychological science provides the mechanism 
for finding answers. This allows us to understand and to predict human behavior. 

We may define aviation psychology as the study of individuals engaged in avi¬ 
ation-related activities. The goal of aviation psychology, then, is to understand and 
to predict the behavior of individuals in an aviation environment. Being able, even 
imperfectly, to predict behavior has substantial benefits. Predicting accurately how a 
pilot will react (behave) to an instrument reading will allow us to reduce pilot error 
by designing instruments that are more readily interpretable and that do not lead 
to incorrect reactions. Predicting how a maintenance technician will behave when 
given a new set of instructions can lead to increased productivity through reduction 
of the time required to perform a maintenance action. Predicting how the length of 
rest breaks will affect an air traffic controller faced with a traffic conflict can lead 
to improved safety. Finally, predicting the result of a corporate restructuring on the 
safety culture of an organization can identify areas in which conflict is likely to 
occur and areas in which safety is likely to suffer. 

From this general goal of understanding and predicting the behavior of individu¬ 
als in the aviation environment, we can identify three more specific goals: first, to 
reduce error by humans in aviation settings; second, to increase the productivity; and 
third, to increase the comfort of both the workers and their passengers. To achieve 
these goals requires the coordinated activities of many groups of people. These 
include pilots, maintenance personnel, air traffic control operators, the managers 
of aviation organizations, baggage handlers, fuel truck drivers, caterers, meteorolo¬ 
gists, dispatchers, and cabin attendants. All of these groups, plus many more, have a 
role in achieving the three goals of safety, efficiency, and comfort. However, because 
covering all these groups is clearly beyond the scope of a single book, we have cho¬ 
sen to focus on the pilot, with only a few diversions into the activities of the other 
groups. Another reason for choosing pilots is that the majority of research has been 
conducted on pilots. This is slowly changing, and more research is being conducted 
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using air traffic controllers, crew members, and other occupational groups involved 
in aviation. 

In this, we enlist contributions from several subdisciplines within the overall 
field of psychology. These include engineering psychology and its closely related 
discipline of human factors, personnel psychology, cognitive psychology, and 
organizational psychology. This listing also matches, to a fair degree, the order in 
which we develop our picture of aviation psychology—moving from fairly basic 
considerations of how the operator interacts with his or her aircraft (the domain 
of engineering psychology and human factors) through considerations of how best 
to select individuals to be trained as pilots (the domain of personnel and train¬ 
ing psychology). Cognitive psychology also contributes to our understanding of 
how individuals learn new tasks, along with providing us information on how 
best to structure jobs and training so that they match the cognitive structure of 
the learner. Finally, from organizational psychology we learn how the structure 
and climate of an organization can contribute to issues such as safety through the 
expectations for behavior fostered among members of the organizations, as well 
as by the reporting and management structure that the corporate executives put 
in place. 

Although aviation psychology draws heavily upon the other disciplines of psy¬ 
chology, those other disciplines are also heavily indebted to aviation psychology 
for many of their advances, particularly in the area of applied psychology. This is 
due primarily to the historic ties of aviation psychology to military aviation. For a 
number of reasons to be discussed in detail later, aviation—and pilots in particular— 
have always been a matter of very high concern to the military. Training of military 
pilots is an expensive and lengthy process, so considerable attention has been given 
since World War I to improving the selection of these individuals so as to reduce 
failures in training—the provenance of personnel and training psychology. 

Similarly, the great cost of aircraft and their loss due to accidents contributed to 
the development of engineering psychology and human factors. Fluman interaction 
with automated systems, now a great concern in the computer age, has been an issue 
of study for decades in aviation, beginning from the introduction of flight director 
systems and in recent years the advanced glass cockpits. Much of the research devel¬ 
oped in an aviation setting for these advanced systems is equally germane to the 
advanced displays and controls that will soon appear in automobiles and trucks. 

In addition, studies of the interaction of crew members on airliner flight decks 
and the problems that ensue when one of the other crew members does not clearly 
assert his or her understanding of a potentially hazardous situation has led to the 
development of a class of training interventions termed crew resource management 
(CRM). After a series of catastrophic accidents, the concept and techniques of CRM 
were developed by the National Aeronautics and Space Administration (NASA) and 
the airline industry to ensure that a crew operates effectively as a team. Building 
upon this research base from aviation psychology, CRM has been adapted for other 
settings, such as air traffic control centers, medical operating rooms, and military 
command and control teams. This is a topic we will cover in much more detail in a 
later chapter. 
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1.2 WHAT IS RESEARCH? 

Before we delve into the specifics of aviation psychology, however, it may be worth¬ 
while to consider in somewhat greater detail the general field of psychology. As noted 
earlier, psychology is the science of behavior and mental process. We describe it as 
a science because psychologists use the scientific method to develop their knowl¬ 
edge of behavior and mental process. By agreeing to accept the scientific method as 
the mechanism by which truth will be discovered, psychologists bind themselves to 
the requirements to test their theories using empirical methods, and they modify or 
reject those that are not supported by the results. The APA defines scientific method 
as “the set of procedures used for gathering and interpreting objective information in 
a way that minimizes error and yields dependable generalizations.”* 


For a discussion of the “received view” of the philosophy of science, see 
Popper (1959) and Lakatos (1970). According to this view, science consists of 
bold theories that outpace the facts. Scientists continually attempt to falsify 
these theories but can never prove them true. For discussions on the applica¬ 
tion of this philosophy of science to psychology, see Klayman and Ha (1987), 
Poletiek (1996), and Dar (1987). 


At a somewhat less lofty level, scientific method consists of a series of fairly stan¬ 
dardized steps, using generally accepted research procedures: 

• Identify a problem and formulate a hypothesis (sometimes called a theory). 

• Design an experiment that will test the hypothesis. 

• Perform the experiment, typically using experimental and control groups. 

• Evaluate the results from the experiment to see if the hypothesis was 
supported. 

• Communicate the results. 

For example, a psychologist might observe that a large number of pilot trainees 
fail during their training (the problem). The psychologist might form a hypothe¬ 
sis, possibly incorporating other observations or information, that the trainees are 
failing because they are fatigued and that the source of this fatigue is a lack of 
sleep. The psychologist might then formally state her hypothesis that the probability 
of succeeding in training is directly proportional to the number of hours of sleep 
received (the hypothesis). The psychologist could then design an experiment to test 
that hypothesis. 

In an ideal experiment (not likely to be approved by the organization training the 
pilots), a class of incoming trainees would be randomly divided into two groups. 
One group would be given X hours of sleep, and the other group would be given Y 
hours of sleep, where Y is smaller than X (design the experiment). The trainees would 


http://www.psychologymatters.org/glossary.html#s 
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be followed through the course and the numbers of failures in each group recorded 
(perform the experiment). The results could then be analyzed using statistical meth¬ 
ods ( evaluate the results) to determine whether, as predicted by the psychologist’s 
hypothesis, the proportion of failures in group X was smaller than the proportion of 
failures in group Y. 

If the difference in failure rates between the two groups was in the expected 
direction and if it met the generally accepted standards for statistical significance, 
then the psychologist would conclude that her hypothesis was supported, and she 
would indicate this in her report {communicate the results). If the data she collected 
from the experiment did not support the hypothesis, then she would have to reject her 
theory or modify it to take the results of the experiment into account. 

Like research in other fields, psychological research must meet certain criteria in 
order to be considered scientific. The research must be 

falsifiable; the hypothesis or theory must be stated in a way that makes it pos¬ 
sible to reject it. If the hypothesis cannot be tested, then it does not meet the 
standards for science. 

replicable; others should be able to repeat a study and get the same results. It is 
for this reason that reports of studies should provide enough detail for other 
researchers to repeat the experiment. 

precise; hypotheses must be stated as precisely as possible. For example, if 
we hypothesize that more sleep improves the likelihood of completing pilot 
training, but only up to some limit (that is, trainees need 8 hours of sleep, but 
additional hours beyond that number do not help), then our hypothesis should 
explicitly state that relationship. To improve precision, operational defini¬ 
tions of the variables should be included that state exactly how a variable is 
measured. Improved precision facilitates replication by other researchers. 

parsimonious; researchers should apply the simplest explanation possible to 
any set of observations. This principle, sometimes called Occam’s razor, 
means that if two explanations equally account for an observation, then the 
simpler of the two should be selected. 


1.3 GOALS OF PSYCHOLOGY 

Describe. Specify the characteristics and parameters of psychological phenom¬ 
ena more accurately and completely. For example, studies have been conducted 
of human short-term memory that very accurately describe the retention of 
information as a function of the amount of information to be retained. 

Predict. Predicting what people will do in the future, based on knowledge of 
their past and current psychological characteristics, is a vital part of many 
aviation psychology activities. For example, accurately predicting who will 
complete pilot training based on knowledge of their psychological test scores 
is important to the organization performing the training. Likewise, predict¬ 
ing who is more likely to be in an aircraft accident based on psychological 
test scores could also be valuable information for the person involved. 
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Understand. This means being able to specify the relationships among vari¬ 
ables—in plain language, knowing the “how” and “why” drives psycholo¬ 
gists and nonpsychologists alike. Once we understand, we are in a position 
to predict and to influence. 

Influence. Once we have learned why a person fails in training or has an acci¬ 
dent, we may be able to take steps to change the outcome. From our ear¬ 
lier example, if we know that increasing the amount of sleep that trainees 
receive improves their likelihood of succeeding in training, then we almost 
certainly will wish to change the training schedule to ensure that everyone 
gets the required amount of sleep every night. 

Psychology can also be broken down into several different general approaches. These 
approaches reflect the subject matter under consideration and, to a large degree, the 
methods and materials used. These approaches include: 

A behaviorist approach looks at how the environment affects behavior. 

A cognitive approach studies mental processes and is concerned with under¬ 
standing how people think, remember, and reason. 

A biological approach is concerned with the internal physiological processes 
and how they influence behavior. 

A social approach examines how we interact with other people and empha¬ 
sizes the individual factors that are involved in social behavior, along with 
social beliefs and attitudes. 

A developmental approach is primarily interested in emotional development, 
social development, and cognitive development, including the interactions 
among these three components. 

A humanistic approach focuses on individual experiences, rather than on peo¬ 
ple in general. 

The delineation of these six approaches may suggest more homogeneity than 
actually exists. Although some psychologists remain exclusively within one of these 
approaches (physiological psychologists are perhaps the best example), for the most 
part psychologists take a more eclectic view—borrowing concepts, methods, and 
theories from among the six approaches as it suits their purpose. Certainly, it would 
be very difficult to classify aviation psychologists into one of these six approaches. 

1.4 MODELS AND PSYCHOLOGICAL CONSTRUCTS 

The rules of science are met in other disciplines (e.g., chemistry, physics, or math¬ 
ematics) through the precise delineation of predecessors, actions, conditions, and 
outcomes. In chemistry, for example, this is embodied in the familiar chemical equa¬ 
tion depicting the reaction between two or more elements or compounds. The chemi¬ 
cal equation for the generation of water from hydrogen and oxygen is unambiguous: 
2H + O = H 2 0. That is, two hydrogen atoms will combine with one oxygen atom to 
form one molecule of water. 
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This is a simple, but powerful model that lets chemists understand and predict 
what will happen when these two elements are united. It also provides a very pre¬ 
cise definition of the model, which allows other scientists to test its validity. For 
example, a scientist might ask: Are there any instances in which H 3 0 is produced? 
Clearly, the production of a model such as this is a very desirable state and represents 
the achievement of the goals, in the chemical domain, that were listed for psychol¬ 
ogy earlier. Although psychology cannot claim to have achieved the same levels of 
specificity as the physical sciences, great progress has nevertheless been made in 
specifying the relationships among psychological variables, often at a quantitative 
level. However, the level of specificity at present generally is inversely related to the 
complexity of the psychological phenomenon under investigation. 

Some of the earliest work in psychology dealt with psychophysics—generally 
including issues such as measurement of “just noticeable differences” (JNDs) 
in the tones of auditory signals or the weights of objects. In Leipzig, Germany, 
Ernst Weber (1795-1878) discovered a method for measuring internal mental 
events and quantifying the IND. His observations are formulated into an equa¬ 
tion known as Weber’s law, which states that the just noticeable difference is a 
constant fraction of the stimulus intensity already present (Corsini, Craighead, 
and Nemeroff 2001). 

More recent efforts have led to the development of several equations describing 
psychological phenomena in very precise models. These include Fitts’s law (Fitts 
1954), which specifies that the movement time (e.g., of a hand to a switch) is a loga¬ 
rithmic function of distance when target size is held constant, and that movement 
time is also a logarithmic function of target size when distance is held constant. 
Mathematically, Fitts’s law is stated as follows: 

MT = a + b log 2 (2A/W) 


where 

MT = time to complete the movement 
a, b = parameters, which vary with the situation 
A = distance of movement from start to target center 
W = width of the target along the axis of movement 

Another such example is Hick’s law, which describes the time it takes a person 
to make a decision as a function of the possible number of choices (Hick 1952). This 
law states that, given n equally probable choices, the average reaction time (7) to 
choose among them is 


T = blog 2 (n + 1) 

This law can be demonstrated experimentally by having a number of buttons with 
corresponding light bulbs. When one light bulb is lit randomly, the person must press 
the corresponding button as quickly as possible. By recording the reaction time, we 
can demonstrate that the average time to respond varies as the log of the number 
of light bulbs. Although a seemingly trivial statement of relationships. Hick’s and 
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Fitts’s laws are considered in the design of menus and submenus used in a variety of 
aviation and nonaviation settings (Landauer and Nachbar 1985). 


WHAT IS A MODEL? 

A model is a simplified representation of reality. It can be a physical, math¬ 
ematical, or logical representation of a system, entity, phenomenon, or pro¬ 
cess. When we talk about a psychological model, we are usually referring to 
a statement, or a series of statements, about how psychological constructs are 
related or about how psychological constructs influence behavior. These mod¬ 
els can be very simple and just state that some things seem to be related. For 
an example, see the description of the SHEL model. 

On the other hand, the model could be quite complex and make specific 
quantitative statements about the relationships among the constructs. For an 
example of this type of model, see the weather modeling study in which a 
mathematical modeling technique is used to specify how pilots combine 
weather information. 

Other models, such as those of human information processing or aeronauti¬ 
cal decision making, make statements about how information is processed by 
humans or how they make decisions. A good model allows us to make predic¬ 
tions about how changes in one part of the model will affect other parts. 


Clearly, from some psychological research, very precise models may be con¬ 
structed of human sensory responses to simple stimuli. Similarly, early work on 
human memory established with a fairly high degree of specificity the relationship 
between the position of an item in a list of things to be remembered and the likeli¬ 
hood of its being remembered (Ebbinghaus, 1885, as reprinted in Wozniak 1999). 

In addition to highly specific, quantitative models, psychologists have also devel¬ 
oped models that specify qualitative or functional relationships among variables. 
Some models are primarily descriptive and make no specific predictions about rela¬ 
tionships among variables other than to suggest that a construct exists and that, in 
some unspecified way, it influences another construct or behavior. Some models pro¬ 
pose a particular organization of constructs or a particular flow of information or 
events. The predicted relationships and processes of those models may be subject to 
empirical tests to assess their validity—a very worthwhile characteristic of models. 
Of particular interest* to the field of aviation psychology are models that deal with 

• general human performance; 

• skill acquisition and expertise development; 


These are but a sampling of the many models currently available. For more information, consult Foyle 
et al. (2005). Wickens et al. (2003), or Wickens and Holland (2000). An extensive review of human 
performance models is also available from Leiden et al. (2001). who include task network, cognitive, 
and vision models. Table 1 in the report by Isaac et al. (2002) also provides a comprehensive listing of 
models. 
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• human information processing; 

• accident etiology; and 

• decision making (specifically, aeronautical decision making). 

1.5 HUMAN PERFORMANCE MODELS 

One of the more widely used models of human performance is the SHEL model, 
originated by Edwards (1988) and later modified by Hawkins (1993). The SHEL 
model consists of the following elements: 

• S—software: procedures, manuals, checklists, and literal software; 

• H—hardware: the physical system (aircraft, ship, operating suite) and 
its components; 

• E—environment: the situation in which the other elements (L, H, and S) 
operate, including working conditions, weather, organizational structure, 
and climate; and 

• L—liveware: the people (pilots, flight attendants, mechanics, etc.). 

This model is typically depicted as shown in Figure 1.1, which highlights the inter¬ 
relationships of the S, H, and L components and their functioning within the envi¬ 
ronment (E). 

Although this model is useful at an overall conceptual level, it is pedagogic rather 
than prescriptive. That is, it serves to help educate people outside the disciplines of 
psychology and human factors about the interactions and dependencies of the SHEL 
elements. However, it makes no specific statements about the nature of those interac¬ 
tions and no quantifiable predictions about the results of disruptions. Clearly, this is 
a simple model of a very complex situation. As such, it has very little explanatory 
power, although it does serve a general descriptive function. 

Despite, or perhaps because of, its simplicity, the SHEL model has proven to 
be a very popular model within human factors, and it is frequently used to explain 


Environment 



FIGURE 1.1 


SHEL model. 
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concepts relating to the interdependencies of these four elements. It is cited, for 
example, by the Global Accident Information Network (GAIN 2001) in its chapter on 
human factors to illustrate the continuous interaction among the elements. It is also 
referenced in the UK Civil Aviation Authority (CAA 2002) publication on human 
factors in aircraft maintenance. The SHEL model has also been used extensively 
outside aviation, particularly in the field of medicine (cf. Bradshaw 2003; Molloy 
and O'Boyle 2005). 

1.5.1 Development of Expertise 

How a pilot develops from a novice to an expert is clearly an issue of keen interest 
to psychologists because of the general agreement that experts are safer than novices 
(an assumption that should not be accepted without question). Accordingly, several 
models have been utilized to help understand this process. Some of these models are 
taken from the general psychological literature on expertise development, and some 
are specifically adapted to the aviation setting. 

One model from the general literature on expertise development specifies five 
developmental levels—from novice to expert—that reflect an increasing capacity to 
internalize, abstract, and apply rules (Dreyfus and Dreyfus 1986): 

A novice learns basic facts, terminology, and rules and how they are applied in 
well-defined circumstances. 

An advanced beginner begins to develop a feel for rules through repeated 
practical application. The student begins to understand the use of concepts 
and rules in situations that are similar to those in prior examples. 

Competence means a deep-enough understanding of the rules to know when 
they are applicable and how to apply them in novel situations. 

Proficiency indicates a refined and internalized sense of the rules. 

An expert produces increasingly abstract representations and is able to map 
novel situations to the internalized representations. 

This model is primarily descriptive in that it makes no specific predictions regard¬ 
ing the transitions between states, other than the general suggestion that there is an 
increasing capacity to internalize, abstract, and apply rules. It leaves unanswered 
questions such as how the process might be accelerated or specifically how to mea¬ 
sure competence at each of the hypothesized stages. 

According to Fitts (1954; Fitts and Posner 1967), there are three phases to skill 
acquisition: 

• The cognitive phase is characterized by slow, declarative learning of pri¬ 
marily verbal information. 

• The associative phase is characterized by the detection and elimination of 
errors in performance and the strengthening of connections. 

• The autonomous phase is characterized by automated and rapid perfor¬ 
mance, requiring less deliberate attention and fewer resources. 
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Arguably, this model, like that of Dreyfus and Dreyfus, is primarily descriptive. 
However, Anderson (1982) has provided a quantitative formulation of the three-phase 
skill acquisition model. This quantitative model (ACT-R) can be used to make spe¬ 
cific quantitative predictions, and as such is subject to more rigorous evaluation than 
the simply descriptive models. In addition, although originally intended to apply to 
motor learning, the model is also applicable to cognitive skill acquisition (VanLenh 
1996), thus broadening its applicability. 

1.6 MODELS OF HUMAN INFORMATION PROCESSING 

Perhaps the best known of the human-information-processing models is that pro¬ 
posed by Wickens (Wickens and Holland 2000), which draws heavily upon previ¬ 
ous research on human memory (Baddeley 1986), cognition (Norman and Bobrow 
1975), and attention (Kahneman 1973). The Wickens model is characterized by the 
presence of discrete stages for the processing of information, the provision of both a 
working memory and a long-term memory, and a continuous feedback stream. The 
provision of an attention resource component is also notable because it implies the 
notion of a limited attention store, which the human must allocate among all the 
ongoing tasks. Hence, the attention has a selective nature. 

Clearly, this model is more sophisticated in its components and proposed interre¬ 
lationships than the more descriptive models considered earlier. This level of sophis¬ 
tication and the wealth of detail provide ample opportunity for the evaluation of the 
validity of this model experimentally. This also makes it a useful tool for under¬ 
standing and predicting human interaction with complex systems. 

1.7 MODELS OF ACCIDENT CAUSATION 

The predominant model of accident causation is the Reason (1990) model or, as it is 
sometimes called, the “Swiss-cheese” model. Because it is so widespread in regard to 
aviation safety, it will be described at length in the later chapter on safety and hence 
will not be described in detail here. For present, let us simply note that the Reason 
model might be properly described as a process model, somewhere midway between 
the purely descriptive models, like Dreyfus and Dreyfus, and the highly structured 
model of Wickens. It describes the process by which accidents are allowed or pre¬ 
vented, but also hypothesizes a rather specific hierarchy and timetable of events and 
conditions that lead to such adverse events. 

Moving away from the individual person, there are also models that treat the rela¬ 
tionships of organizations and the flow of information and actions within organiza¬ 
tions. The current term for this approach is “safety management systems,” and it has 
been adopted by the International Civil Aviation Organization (ICAO 2005) and by 
all the major Western regulatory agencies, including, for example, the U.S. Federal 
Aviation Administration, the UK Civil Aviation Authority (CAA 2002), Transport 
Canada (2001), and the Australian Civil Aviation Safety Authority (CASA 2002). 
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1.8 MODELS OF AERONAUTICAL DECISION MAKING (ADM)* 

Following a study by Jensen and Benel (1977) that found that poor decision making 
was associated with about half of fatal general aviation accidents, a great deal of 
interest developed in understanding how pilots make decisions and how that process 
might be influenced. This interest led to the development of a number of prescrip¬ 
tive models that were based primarily on expert opinion. One such example is the 
“I'M SAFE” mnemonic device, which serves to help pilots remember to consider the 
six elements indicated by the 1MSAFE letters: illness, medication, stress, alcohol, 
fatigue, and emotion. 

The DECIDE model (Clarke 1986) could be considered both descriptive and pre¬ 
scriptive. That is, it not only describes the steps that a person takes in deciding on 
a course of action, but also can be used as a pedagogic device to train a process for 
making decisions. The DECIDE model consists of the following steps: 

• D—Detect: the decision maker detects a change that requires attention. 

• E—Estimate: the decision maker estimates the significance of the change. 

• C—Choose: the decision maker chooses a safe outcome. 

• I—Identify: the decision maker identifies actions to control the change. 

• D—Do: the decision maker acts on the best options. 

• E—Evaluate: the decision maker evaluates the effects of the action. 

In an evaluation of the DECIDE model, Jensen (1988) used detailed analyses of 
accident cases to teach the DECIDE model to 10 pilots. Half of the pilots received 
the training, while the other half served as a control group. Following the training, 
the pilots were assessed in a simulated flight in which three unexpected conditions 
occurred, requiring decisions by the pilots. A review of the experimental flights 
indicated that all of the experimental group members who chose to fly (four of five) 
eventually landed safely. All of the control group members who chose to fly (three 
of five) eventually crashed. Although the very small sample size precluded the usual 
statistical analysis, the results suggest some utility for teaching the model as a struc¬ 
tured approach to good decision making. 

As a result of this and other studies conducted at Ohio State University, Jensen 
and his associates (Jensen 1995, 1997; Kochan et al. 1997) produced the general 
model of pilot judgment shown in Figure 1.2 and the overarching model of pilot 
expertise shown in Figure 1.3. 

This overall model of pilot expertise was formulated based on four studies of pilot 
decision making. These studies began with a series of unstructured interviews of 
pilots who, on the basis of experience and certification, were considered experts in 
the area of general aviation. The interviews were used to identify and compile char¬ 
acteristics of these expert pilots. Successive studies were used to identify the salient 
characteristics further, culminating in the presentation to the pilots of a plausible 
general aviation flight scenario using a verbal protocol methodology. The results 
from the final study, in combination with the earlier interviews, suggested that, when 


O'Hare (1992) provides an in-depth review of the multiple models of ADM. 
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Decision implementation: Psychomotor skill, control 
manipulation_ 


FIGURE 1.2 General model of pilot judgment. 

compared to competent pilots, expert pilots tended to (1) seek more quality informa¬ 
tion in a more timely manner, (2) make more progressive decisions to solve prob¬ 
lems, and (3) communicate more readily with all available resources (Kochan et al. 
1997). 

A somewhat different formulation of the ADM process was offered by Hunter 
(2002), who suggested that decision making is affected by 

• pilot’s knowledge; 

• pilot’s attitudes and personality traits; 

• pilot’s ability to find and use information effectively; 

• quality, quantity, and format of available information; 

• pilot’s ability to deal with multiple demands; 

• pilot’s repertoire of possible responses; 

• capabilities of aircraft and systems; 

• available outside support; and 

• outcomes of earlier decisions. 
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FIGURE 1.3 ADM expertise model. 

Hunter combined these elements into the sequence of events and influences into 
the somewhat more general model of performance depicted in Figure 1.4. This model 
shares some aspects of Jensen’s model, but asserts a greater level of detail. This greater 
level of detail allows more precise tests of the model to be conducted. For example. 
Hunter’s model suggests that recognition is separate from interpretation and that 
those stages of processing are influenced by different pilot characteristics. Knowledge 
(memory) is involved in both stages; however, aspects of the pilot such as personality 
and risk tolerance have an impact on the interpretation, but not the recognition stages. 
Experiments could thus be devised to test the predictions of this model. 

In a different approach to understanding how pilots make decisions. Hunter, 
Martinussen, and Wiggins (2003) used a linear modeling technique to examine the 
weather-related decision-making processes of American, Norwegian, and Australian 
pilots. In this study, pilots were asked to assign a comfort rating to each of 27 weather 
scenarios, flown over three different routes. These data were then used to develop 
individual regression equations* for each pilot that described how each individual 
pilot combined information about weather conditions (cloud ceiling, visibility, and 
amount and type of precipitation) to make his or her comfort rating. 

Examination of the weights that the pilots used in combining the information 
allowed Hunter et al. to conclude that pilots among these diverse groups used a 

A regression equation is a mathematical equation that shows how information is combined by using 
weights assigned to each salient characteristic. The general form of the equation is Y = bjXj + b 2 x 2 , 
+ ... + c, where bl, b2, etc. are the weights applied to each characteristic. Most introductory texts on 
statistics will include a discussion on linear regression. For a more advanced, but still very readable 
description of the technique, see Licht (2001). 
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FIGURE 1.4 Hunter’s performance model. 

consistent weather decision model. For each group, a compensatory model was 
favored over a noncompensatory model. This means that the pilots were allowing 
a high score on one of the variables (e.g., ceiling) to compensate for a low score on 
another of the variables (e.g., visibility). In practical terms, this means that pilots 
might take off on a flight under potentially hazardous conditions (low visibility, but 
high ceiling) because of the way the information was combined. 

Typically, psychological models become less quantitative and more qualitative as 
they attempt to account for more complex behavior. For example, contrast the very 
precise equations that relate response time to number of choices to the models that 
attempt to account for human information processing. However, the study by Hunter 
and colleagues demonstrates that, in some cases, we can establish quantitative rela¬ 
tionships for relatively complex behavior and stimuli. 

Although linear modeling provides a powerful technique for establishing quanti¬ 
tative models, even more powerful statistical modeling techniques are now available. 
Structural equation modeling (SEM) allows psychologists to specify variables and 
the quantitative relationships among those variables, and then to test the veracity of 
that model. Although a discussion of SEM is well beyond the scope of this book, let 
us simply say that it allows researchers to create models, such as those of Wickens, 
Jensen, or Hunter, and to assign specific numerical relationships among the processes 
and conditions. These numerical parameters may then be tested statistically and the 
validity of the hypothesized model tested empirically. The interested reader may con¬ 
sult Raykov and Marcoulides (2006) for an introduction to SEM; however, texts on 
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Lisrel, AMOS, and EQS—three of the common implementations of SEM—are also 
widely available. A few examples of the use of SEM can also be found in aviation, 
including mental workload of student pilots (Sohn and Jo 2003), mental workload 
and performance in combat aircraft (Svensson et al. 1997), and analysis of the psy¬ 
chometric properties of the U.S. Navy aviation selection test battery (Blower 1998). 

All of these models present a view of reality from a different perspective. 
Arguably, none of them is true in some absolute sense. Rather, they all present a sim¬ 
plified abstraction of reality. Previously, we described the work on modeling of pilot 
weather-related decision making that we and Mark Wiggins from the University of 
Western Sydney conducted (Hunter et al. 2003). Although the results of the math¬ 
ematical modeling process produced very reliable results, we would certainly not 
argue that humans actually have small calculators in their heads that they use to 
evaluate such judgments. Rather, we would suggest that whatever is happening dur¬ 
ing the decision-making process can be predicted rather accurately using our math¬ 
ematical model. 

The distinction is important because later in this book we will discuss many 
psychological constructs and describe research that shows relationships between 
a construct and some outcome of interest. For example, we might look at how 
intelligence relates to completing training successfully. Or, we could examine 
how a person’s internality (the degree to which he or she believes himself or 
herself in control of his or her destiny) relates to accident involvement. Both 
intelligence and internality are psychological constructs—convenient titles given 
to hypothesized underlying psychological traits and capacities. A psychological 
construct is an abstract theoretical variable invented to explain some phenomenon 
of interest to scientists. Like most of the topics introduced in this book, the issue 
of constructs, in particular their measurement, is also the subject of many articles 
and books in its own right. (For more information, see Campbell and Fiske [1959] 
regarding the measurement of constructs. Also, for a contemporary problem in 
construct definition, see the discussion on emotional intelligence by Mayer and 
Salovey [1993].) 

The reader should be aware of the nebulous nature of these constructs and the 
models that psychologists have devised to describe the relationships among con¬ 
structs and external events. The insights that the constructs provide and the predic¬ 
tions that may be made from the models are potentially useful, even though the 
utility of a psychological construct or model is no guarantee of its underlying physi¬ 
cal reality* The reader should not dismiss the constructs as mere “psychobabble” or 
the models as gross oversimplifications of a complex world. Both can be taken as a 
means of helping us understand the world by framing it in familiar terms. 

Before we leave the discussion of models, constructs, and theories, let us make one 
last point. Although specifically addressed to the role of Fitts’s law, the comments of 

This situation is not unique to psychological science. The Bohr model of the atom, proposed by Niels 
Bohr in 1915, is not completely correct, but has many features that are approximately correct and that 
make it useful for some discussions. At present, the generally accepted theory of the atom is called 
quantum mechanics; the Bohr model is an approximation to quantum mechanics that has the virtue of 
being much simpler. Perhaps in 100 years, the quantum mechanical model will be considered a quaint 
approximation to reality. 
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Pew and Baron (1983, p. 664) regarding models and theories are broadly applicable. 
They note: 

There is no useful distinction between models and theories. We assert that there is a 
continuum along which models vary that has loose verbal analogy and metaphor at 
one end and closed-form mathematical equations at the other, and that most models lie 
somewhere in-between. Fitts’ law may be placed in this continuum. As a mathematical 
expression, it emerged from the rigors of probability theory, yet when transplanted into 
the realm of psychomotor behavior it becomes a metaphor. 


1.9 SUMMARY 

In this chapter we have tried to introduce the student to some of the concepts and goals 
of psychology and to delineate some of the domain of aviation psychology. We have 
also outlined some of the dominant models of human performance that are currently 
applied to further our understanding of how humans perform in an aviation setting. 

Aviation psychology represents an amalgamation of the various approaches and 
subdisciplines within psychology. In the following chapters we will delve more 
deeply into the design and development of aviation systems, the selection and train¬ 
ing of pilots, and efforts to improve safety from a psychological perspective. Like 
the rest of the aviation community, aviation psychologists share the ultimate goals of 
improving safety, efficiency, and comfort. Practitioners of aviation psychology bring 
to bear the tools and techniques of psychology to describe, predict, understand, and 
influence the aviation community to achieve those goals. 
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2.1 INTRODUCTION 

The purpose of research is to gain new knowledge. In order to be able to trust new 
findings, it is important that scientific methodology is used. The tools that research¬ 
ers use to conduct research should be described in sufficient detail so that other 
researchers may conduct and replicate the study. In other words, an important prin¬ 
ciple is that the findings should be replicable, which means that they are confirmed in 
new studies and by other researchers. There are several scientific methods available, 
and the most important aspect is that the methods are well suited for exploring the 
research question. Sometimes, the best choice is to use an experiment, whereas at 
other times a survey may be the best choice. 

Research ideas may come from many sources. Many researchers work within an 
area or research field, and a part of their research activity will be to keep updated 
on unresolved questions and unexplored areas. Other times, the researcher will get 
ideas from his or her own life or things that happen at work, or the researcher may 
be asked to explore a specific problem. 

Research can be categorized in many ways—for example, basic and applied 
research. In basic research, the main purpose is to understand or explain a phenom¬ 
enon without knowing that these findings will be useful for something. In applied 
research it is easier to see the possibilities for using the research findings for some¬ 
thing. Frequently, the boundaries between these two types of research will be unclear, 
and basic research may later be important as a background for applied research and 
for the development of products and services. As an example, basic research about 
how the human brain perceives and processes information may later be important in 
applied research and in the design of display systems or perhaps for developing tests 
for pilot selection. 

Research should be free and independent. This means that the researcher should 
be free to choose research methods and to communicate the results without any form 
of censorship. To what extent the researcher is free to choose the research problem is 
partly dependent upon where the researcher works; however, frequently one impor¬ 
tant practical limitation is lack of funding. Even though the researcher may have 
good ideas for a project and have chosen appropriate methods, the project may not 
receive any funding. 
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2.2 THE RESEARCH PROCESS 

The research process normally consists of a series of steps (Figure 2.1) that the 
researcher proceeds through from problem description to final conclusion. The first 
step is usually a period in which ideas are formulated and the literature on the topic 
reviewed. The first ideas are then formulated in more detail as problems and hypoth¬ 
eses. Some may be very descriptive—for example, the prevalence of fear of flying 
in the population. At other times, the purpose may be to determine the cause of 
something—for example, whether a specific course aimed at reducing fear of flying 
is in fact effective in doing so. 

The next step will be to choose a method well suited for studying the research 
problem. Sometimes aspects other than the nature of the problem or hypothesis 
will influence the choice of method—for example, practical considerations, tradi¬ 
tion, and ethical problems. Within certain disciplines, some methods are more 
popular than others, and choice of methods may also depend on the training that 
the researcher has received. In other words, there will likely be many aspects 
involved when choosing research methods and design in addition to the nature of 
the research question. The next step in the research process involves data collec¬ 
tion. Data may be quantitative, implying that something is measured or counted 
that may be processed using statistical methods, or that data may be qualitative, 
often involving words or text. The majority of research conducted within aviation 
is based on quantitative data, and this chapter will focus on how to collect, process, 
and interpret such data. 



FIGURE 2.1 


The research process. 
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2.3 LITERATURE REVIEW AND RESEARCH QUESTIONS 

There are a large number of scientific journals in psychology, and some of them 
publish aviation-related research. One example is the International Journal of 
Aviation Psychology, which is an American journal published by Taylor & Francis. 
Another journal published by the U.S. Federal Aviation Administration (FAA) is 
the International Journal of Applied Aviation Studies, which is available online* 
In addition, a medical journal called Aviation, Space, and Environmental Medicine 
is published by the Aerospace Medical Association. A number of other psychology 
journals will publish articles in aviation psychology and other related areas of work 
and organizational psychology—for example, Human Factors, Journal of Applied 
Psychology, and Military Psychology. Articles from these journals may be accessed 
through a library. One can also visit the journal’s Web site, but it may be necessary 
to purchase the article, at least the more recent issues. In addition, the FAA has 
produced a large number of reports over the years that are available online, free of 
charge. A general literature search using a regular search engine may also result in 
relevant articles and reports, but it is important to evaluate the source and quality of 
the information critically. 

Before they are published, scientific articles have been through a peer-review 
system. This means that two or three other researchers in the field will review the 
article. The author then receives the feedback from the reviewers and the editor, and 
in most cases the article will have to be revised before it is published. Sometimes, 
the quality of the article is too poor compared to the standards of the journal, and the 
author is not given the option to revise and resubmit. The journals vary in relation to 
the proportion of submitted articles that are accepted and how often articles from the 
journal are cited by other authors. 

Research in aviation psychology is also presented at conferences. One of the 
organizers of such conferences is EAAP (European Association of Aviation 
Psychology). EAAP is an association of aviation psychologists now more that 50 
years old that organizes conferences every 2 years. In addition, the International 
Symposium in Aviation Psychology is organized every 2 years and is usually held 
in Dayton, Ohio. Approximately every 3 years, the Australian Association for 
Aviation Psychology organizes a conference in Sydney. Proceedings are usually 
published after the conferences, and they include the papers presented at the con¬ 
ference, usually in the form of short articles. These are available for those attend¬ 
ing the conference and sometimes also through the Web sites of the organizers. 
Researchersf often present their results at a conference before the results appear 
in a journal, so attending conferences may provide a snapshot of the latest news 
in the area. 


Web links for this and other sources are available in the listings given in Chapter 9. 

' Researchers are usually thrilled to get a request for their publications, so if an article cannot be found 
through a library or on the Web, try sending a note to the author and asking for a copy. Many of the 
researchers in this area are members of the Human Factors and Ergonomics Society, and their e-mail 
addresses can be found through that organization. 
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2.4 RESEARCH PROBLEMS 

Not all questions can be answered through research, and even if the questions or 
problems can be examined by scientific methods, it is often necessary to restrict the 
study to some selected questions. One type of research problem can be categorized 
as descriptive. This term is used when the purpose is to describe a phenomenon or 
how things co-vary. An example of a descriptive research problem would be to assess 
the level of stress among air traffic controllers or to examine whether a test can be 
used to select cabin crew members. Sometimes, in addition to an overall research 
problem, more specific questions may be formulated as hypotheses. Suppose a sur¬ 
vey on job stress is conducted among air traffic controllers. The researcher might 
want to examine the hypothesis that there is an association between lack of social 
support and experienced stress. Such hypotheses are usually based on a theory or on 
earlier research findings. 

If a correlation between two variables—for example, between social support and 
stress—is discovered in a survey, it does not necessarily mean that a causal rela¬ 
tionship exists between the two variables. It may be a causal link, but a correlation 
is not sufficient to determine this. In addition, we need to know that social sup¬ 
port precedes the feeling of stress, and preferably, we should know something about 
the mechanism behind or how social support reduces or acts as a buffer to stress. 
Perhaps a third variable affects both variables studied and is the real cause of varia¬ 
tions in both social support and stress. If a causal relationship is the main focus of 
the study, then the research question will have to be formulated differently than in 
descriptive research, and it will also require a different research design from that 
used in descriptive research. The best way to examine causal problems is usually by 
means of an experiment. 

2.5 VARIABLES 

Variables are aspects or attributes of a person or phenomenon studied; they may 
have different values depending on what is being measured. Some examples include 
the age of a person or the number of hours of flying experience. Alternatively, we 
may be interested in personal characteristics such as personality traits—for example, 
extroversion or whether the person has completed a CRM (crew resource manage¬ 
ment) course. Variables manipulated by the researcher are usually labeled indepen¬ 
dent variables (e.g., whether a person receives a course or not). The variables studied 
afterward (e.g., improved communication skills or fewer operational mistakes) are 
called dependent variables. 

Variables also differ in how they are measured or assessed. Some variables are 
easy to assess; for example, age could be measured in years (at least for adults) or 
salary measured in dollars or Euros. Both age and salary are continuous variables 
measured on a scale with equal intervals between the values. Many psychological 
variables are measured using a five-point scale or a seven-point scale where the per¬ 
son is asked to indicate how much she or he agrees with a statement. Sometimes 
fewer categories are used, and it can be argued whether such scales can be seen as 
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continuous or not; this has implications for the statistical methods that may be used 
for analyzing the data. 

Some variables are categorical; for example, gender has two categories (man/ 
woman), whereas masculinity could be measured on a continuous scale. Numbers 
may also be used for categorical variables, but would only say something about 
group membership (e.g., men are given the value 1 and women the value 2) and 
simply serve as labels. The number would only indicate that they are different, and 
the letters A and B could be used instead. Variables may, in other words, have many 
values and the numbers would contain different types of information depending on 
the type of variable and what is being measured. 

2.6 DESCRIPTIVE METHODS AND MEASUREMENT 

Research usually requires measurement or some form of categorization. Some vari¬ 
ables are easy to assess, and others are not. Many psychological constructs are not eas¬ 
ily observed—for example, intelligence, anxiety, or personality traits. These constructs 
will have to be operationalized before they can be measured. Psychological tests are 
one way of measuring these constructs, in addition to interviews and observations. 

2.6.1 Psychological Tests 

A psychological test is a standardized procedure for assessing the amount or magni¬ 
tude of something. It could involve a handful of questions or an extensive procedure 
involving equipment and computers. Psychological tests may be used for many dif¬ 
ferent purposes, including clinical use, personnel selection, and research. Regardless 
of purpose, it is important that the tests be of high quality, which usually involves 
three requirements: reliability, validity, and appropriate norms. Each of these con¬ 
cepts will be discussed in detail on the following pages. 

There are many requirements for tests and test users. The European psychological 
associations have agreed on a common set of guidelines (International Guidelines 
for Test Use), which can be found on the Web sites of the national psychological 
associations or the International Test Commission. The American Psychological 
Association (APA 1999) has for many years published a book called The Standards 
for Educational and Psychological Testing, which explains reliability and validity in 
addition to providing guidelines for how tests should be used appropriately, profes¬ 
sionally, and in an ethical manner. 

2.6.2 Classical Test Theory 

In psychology, we usually assume that a person’s test score consists of two components 
that together constitute what is called the observed score. One component is the person’s 
true score, while the other is the error term. This can be expressed as follows: Observed 
score = true score + error part. If the same person is tested several times under the same 
conditions, we would expect a similar but not identical test score every time. The observed 
scores will vary slightly from time to time. If the error term is small, then the variations 
will be smaller than if the error term is large. In addition, we make the assumption that 
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the error is unsystematic—for example, that the error does not depend on a person’s true 
score. This model of measurement is called classical test theory (Magnusson 2003) and 
is the starting point for many of the psychological tests used today. 

A more recent development is item response theory (Embretson and Reise 2000), 
which represents a different way of studying test scores. It is the starting point of 
so-called adaptive testing, in which the degree of difficulty on tasks is tailored to 
the individual level. The theory is based on the assumption that the probability of 
answering correctly is a function of a latent trait or ability. Item response theory 
makes stronger assumptions than classical test theory, but the advantage is a more 
detailed analysis of the items. One disadvantage is the lack of user-friendly software 
to conduct the analyses compared to the analyses performed in classical test theory, 
which may be performed with most statistical programs. In the years to come, more 
tests will certainly be developed based on this theory. 

2.6.3 Reliability 

Within classical test theory, reliability is defined as the correlation between the two 
parallel tests—that is, two similar but not identical tests. The correlation coefficient 
is a statistical measure of covariation between two variables. This index will be 
further discussed in Section 2.8 on statistics. In other words, if we test a group of 
people twice using two parallel tests, a large correlation is expected between the test 
scores if the error term is small. However, it is often difficult and time consuming to 
create parallel tests, so a different approach to examining the reliability of the test 
scores is needed. 

One way to estimate reliability is to test the same group of people twice with the 
same test (e.g., after a few months). This is called test-retest reliability. Another 
common approach is to divide the test into two parts—for example, by taking every 
other question in each section. Then a group of people is tested and two scores are 
calculated for each person, one for each part of the test. Finally, the correlation is 
calculated between the two parts. However, this approach will result in an estimate 
of the reliability of a test with only 50% of the items or half the length of the original 
test. It is possible to correct the reliability estimate for this using a formula that pro¬ 
vides an estimate of the reliability for the entire test. 

Of course, we can divide the test into two parts in many ways, depending on how 
the questions or items are split. One form of reliability that is frequently used is the 
Cronbach's alpha, which is a kind of average split-half reliability of all the possible 
split-half reliability estimates for a given measure. The various types of reliability 
will provide different types of information about the test scores. Test-retest reliabil¬ 
ity will say something about the stability over time, and the other forms of reliability 
will provide information about the internal consistency (split-half and Cronbach's 
alpha). The calculated correlations should be as high as possible, preferably .70 or 
.80, but sometimes lower values may be accepted. One factor affecting test reliability 
is the number of questions: The more questions there are, the higher the reliability 
is. In addition, it is important that the test conditions and scoring procedures be 
standardized, which means that clear and well-defined procedures are used for all 
the subjects. 
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2.6.4 Validity 

The most important form of test validity is construct validity. This refers to the extent 
to which the test measures what it purports to be measuring. Does an intelligence test 
measure intelligence or is the test only a measure of academic performance? There is 
no simple solution to the problem of documenting adequate construct validity. Many 
strategies may be used to substantiate that the test actually measures what we want it 
to measure. If, for example, we have developed a new intelligence test, then one way 
to examine construct validity would be to investigate the relationship between the 
new test and other well-established intelligence tests. 

Another form of validity is criterion-related validity. This refers to the extent to 
which the test predicts a criterion. If the criterion is measured about the same time 
as the test is administered, then the term “concurrent validity” is used, in contrast to 
“predictive validity,” which is used when a certain time period has passed between 
testing and measurement of the criterion. Predictive validity asks the question: Can 
the test scores be used for predicting future work performance? The predictive valid¬ 
ity is usually examined by calculating the correlation between the test scores and a 
measure of performance (criterion). 

Content validity, which is the third type of validity, concerns the extent to which 
the test items or questions are covering the relevant domain to be tested. Do the exam 
questions cover the area to be examined or are some parts left out? 

These three forms of validity may seem quite different, and the system has 
received some criticism (Guion 1981; Messick 1995). An important objection has 
been that it is not the test itself that is valid; rather, the conclusions that we draw on 
the basis of test scores must be valid. Messick (1995) has long argued that the exami¬ 
nation of test validity should be expanded to include value implications inherent in 
the interpretation of a test score as well as social consequences of testing. This would 
involve an evaluation to see whether the use of a particular test may have unfortunate 
consequences—for example, that special groups are not selected in connection with 
the selection of a given job or education. 

2.6.5 Test Norms and Cultural Adaptation 

In addition to reliability and validity, it is often desirable that the test be standard¬ 
ized. This means that the test scores for a large sample of subjects are known so 
that a person may be compared with the mean of these scores. Sometimes it may be 
appropriate to use a random sample of the population when establishing test norms, 
whereas at other times more specialized groups are more relevant. Imagine a situa¬ 
tion where a person is tested using an intelligence test and the result is calculated as 
the number of correct responses. Unless we compare the result with something, it is 
hard to know whether the person performed well or not. The result could be com¬ 
pared to the average number of correct responses based on other adults in the same 
age group and from the same country. The tests should be administered in the same 
way for all who have been tested. Everyone receives the questions in the same order, 
with the same instructions, and with specific scoring procedures. 
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This is an important principle in order to be able to compare performance. If a 
test is not developed and norms established in the country where the test will be 
used, then it is necessary to translate and adapt the test to the new conditions. For 
example, when a test developed in the United States is used in Norway, the test needs 
to be translated to Norwegian. A common procedure is first to translate the test into 
Norwegian and then for another bilingual person to translate the work back to the 
original language. This procedure may reveal problems in the translation that should 
be resolved before the test is used. Even if we can come up with a good word-for- 
word translation, test reliability and validity should be examined again, and national 
norms should be established. 

2.6.6 Questionnaires 

A questionnaire is an efficient way to collect data from large groups. It is also 
easier to ensure that the person feels that he or she can respond anonymously 
(e.g., compared to an interview). A questionnaire may consist of several parts or 
sections (e.g., one section on background information such as age, gender, educa¬ 
tion, and experience). The questionnaire will include specific questions designed 
for the purpose and may also include more established scales that, for example, 
measure personality characteristics such as extroversion. The answers can be open 
ended or with closed options. Sometimes a five-point or seven-point scale is used 
where people can indicate their opinion by marking one of the options. If the ques¬ 
tionnaire involves using established scales—for example, to measure satisfaction 
in the workplace—it is important to keep the original wording and the response 
options of these scales identical to the original measurement instrument. Changes 
may alter the psychometric properties of the instrument and make comparisons 
with other studies difficult. 

To formulate good questions is an art, and often there will be a lot of work 
behind a good questionnaire. It is important to avoid formulating leading or ambig¬ 
uous questions, to use a simple language, and avoid professional terminology and 
expressions. A questionnaire that looks appealing and includes clear questions 
increases the response rate. The length of the questionnaire is also related to the 
response rate, so shorter questionnaires are preferred; thus, designers must think 
carefully about whether all the questions really are necessary. A good summary 
of advice when formulating questions and conducting a survey is provided by Fink 
and Kosecoff (1985). 

Researchers would like the response rate (the proportion of people who actually 
complete the survey and/or send back the questionnaire) to be as high as possible. 
Some methods books claim that it should be at least 70%, but this proves difficult to 
achieve in practice, even after a reminder has been sent to all the respondents. A meta¬ 
analysis of studies in clinical and counseling psychology summarized 308 surveys 
and found an average response rate of 49.6% (Van Horn, Green, and Martinussen 
2009). The survey showed that the response rate increased by an average of 6% after 
the first reminder. Response rate also declined over the 20-year period covered by 
the meta-analysis (1985-2005). 
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2.6.7 Internet 

Many surveys are conducted via the Internet, and various programs can be used to 
create Internet-based questionnaires. Some of the programs must be purchased, but 
other applications can be freely downloaded via the Web. One example of free soft¬ 
ware is Modsurvey, which was developed by Joel Palmius in Sweden (http://www. 
modsurvey.org/). Participants in Internet surveys may be recruited by sending them 
an e-mail or by making the address of the Web site known to the audience in many 
ways. It is often difficult to determine what the response rate is in online surveys. 
This is due in part to the fact that e-mail addresses change more frequently than resi¬ 
dential addresses, and thus it is difficult to know how many people actually received 
the invitation. In cases where participants are recruited through other channels, it 
may also be difficult to determine how many people were actually informed about 
the survey. 

Internet surveys are becoming very popular because they are efficient and save 
money on printing, postage, and also punching of data. However, these surveys may 
not be the best way to collect data for all topics and all participant groups. Not 
everyone has access to a PC, and not all people will feel comfortable using it for 
such purposes. 

2.6.8 Interview 

An interview can be used for personnel selection and as a data collection method. The 
interview may be more or less structured in advance; that is, the extent to which the 
questions are formulated and the order can be determined in advance. When explor¬ 
ing new areas or topics, it is probably best for the questions to be reasonably open; 
at other times, specific questions should be formulated and, in some instances, both 
the questions and answering options will be given. If the questions are clearly formu¬ 
lated in advance, it will probably be sufficient for the interviewer to write down the 
answers. During extensive interviews, it may be necessary to use a tape recorder, and 
the interview will have to be transcribed later. An interview is obviously more time 
consuming to process than a questionnaire, but probably more useful when complex 
issues have to be addressed or new themes explored. It may therefore be wise to con¬ 
duct some interviews with good informants before developing a questionnaire. 

Before the interview begins, an interview guide with all the questions is usu¬ 
ally constructed, and if multiple interviewers are used, they should all receive 
the necessary training so that the interviews are conducted in the same way. It 
is also important that interviewers are aware of the possible sources of error in 
the interview and how the interviewer may influence the informants with his or 
her behavior. 

2.6.9 Observation 

Like the interview, observation of people may be more or less structured. In a struc¬ 
tured observation, the behavior to be observed is specified in advance, and there are 
clear rules for what should be recorded. An example would be an instructor who is 
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evaluating pilots' performance in a simulator. Then different categories should be 
specified and what constitutes good and poor performance should be outlined in 
advance. The observers may be a part of the situation, and the observed person may 
not even be aware that he or she is observed. This is called hidden observation. 

An obvious problem with observation is that people who are observed may be 
influenced by the fact that an observer is present. A classic experiment that illus¬ 
trates this is the Hawthorne study, where workers at a U.S. factory producing tele¬ 
phone equipment were studied. The purpose was to see whether various changes in 
lighting, rest hours, and other working conditions increased the workers’ production. 
Irrespective of the changes implemented—increased lighting or less light—produc¬ 
tion increased. One interpretation of these findings was that being observed and 
receiving attention affected individuals’ job performances. The study is described 
in most introductory textbooks in psychology. The fact that people change behavior 
when being observed has subsequently been given the term “Hawthorne effect” after 
the factory where these original studies were conducted in the 1920s. 

The study has since been criticized because the researchers failed to consider a 
number of other factors specific to the workers who participated in the study. One 
difference was that women who took part in the study gained feedback on their 
performance and received economic rewards compared to the rest of the factory 
workers (Parsons 1992). This shows that studies may be subject to renewed scrutiny 
and interpretation more than 60 years after they have been conducted. Regardless of 
what actually happened in the Hawthorne plant, it is likely that people are influenced 
by the fact that they are observed. One way to prevent this problem is to conduct a 
so-called hidden observation. This is not ethically unproblematic, especially if one 
is participating in the group observed. The situation is different if large groups are 
observed in public places and the individuals cannot be identified. For example, if 
the researcher is interested in how people behave in a security check, this may be an 
effective and ethical research method. 

Often, in the beginning of a study, those who are observed are probably aware of 
the fact that an additional person is present in the situation. After some time, how¬ 
ever, the influence is probably less as the participants get used to having someone 
there and get busy with work tasks to be performed. 

2.7 EXPERIMENTS, QUASI-EXPERIMENTS, 

AND CORRELATION RESEARCH 

In a research project, it is important to have a plan for how the research should be 
conducted and how the data should be collected. This is sometimes called the design 
of the study; we will describe three types of design. 

2.7.1 Experiments 

Many people probably picture an experiment as something that takes place in a 
laboratory with people in white coats. This is not always the case, and the logic 
behind the experiment is more crucial than the location. An important feature of an 
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experiment is the presence of a control group. The term “control group” is used for 
one of the groups that receives no treatment or intervention. The control group is 
then compared to the treatment group, and the differences between the groups can 
then be attributed to the treatment that one group received and the other did not. This 
requires that the groups be similar, and the best way to ensure this is by randomly 
assigning subjects to the two conditions. 

Another important feature of an experiment is that the researcher can control the 
conditions to which people are exposed. Sometimes, several experimental groups 
receive different interventions. For example, a study may include two types of train¬ 
ing, both of which will be compared to the control group that receives no training. 
Alternatively, there may be different levels of the independent variable; for example, 
a short course may be compared to a longer course. 

2.7.2 Quasi-Experiments 

A compromise between a true experiment and correlational studies is a so-called 
quasi-experimental design. The purpose of this design is to mimic the experiment 
in as many ways as possible. An example of a quasi-experiment is to use a compari¬ 
son group and treatment group to which the participants have not been randomly 
assigned. It is not always possible to allocate people at random to experimental and 
control conditions. Perhaps those who sign up first for the study will have to be 
included in the experimental group, and those who sign up later will have to be 
included in the comparison group. Then the researcher has to consider the possibility 
that these groups are not similar. 

One way to explore this would be to do some pretesting to determine whether 
these groups are more or less similar in relation to important variables. If the groups 
differ, this may make it difficult to draw firm conclusions about the effect of the 
intervention. There are other quasi-experimental designs, such as a design without 
any control or comparison group. One example would be a pretest/posttest design 
where the same people are examined before and after the intervention. The problems 
associated with this design will be addressed in the section on validity. 

2.7.3 Correlational Research 

It is not always possible to conduct a real experiment for both practical and ethical 
reasons. For example, it may not be possible to design an experiment in which the 
amount of social support employees receive from the leader is manipulated. This 
approach will be viewed as unethical by most people, but studying natural variation 
in this phenomenon is possible. Research in which working conditions are studied 
will often include a correlational design. The purpose may be to map out various 
work demands such as workload and burnout. After these variables have been stud¬ 
ied, various statistical techniques may be used to study the relationship between 
these variables. Also, more complex models of how multiple variables are connected 
with burnout can be studied, in addition to examining the extent to which burnout 
can be predicted from work-related factors and personal characteristics. 
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2.8 STATISTICS 

An important part of the research process occurs when the results are processed. 
If the study includes only a few subjects, then it is easy to get an overview of the 
findings. However, in most studies, a large number of participants and variables are 
included, so a tool is clearly needed to get an overview. Suppose that we have sent out 
a questionnaire to 1,000 cabin crew members to map out what they think about their 
working environment. Without statistical techniques, it would be almost impossible 
to describe the opinions of these workers. 

Statistics can help us with three things: 

• sampling (how people are chosen and how many people are required); 

• describing the data (graphical, variation, and the most typical response); and 

• drawing conclusions about parameters in the population. 

In most cases, we do not have the opportunity to study the entire population (e.g., 
all pilots and all passengers), and a smaller group needs to be sampled. Most statisti¬ 
cal methods and procedures assume a random selection of subjects, which means 
that all participants initially have equal chances to be selected. There are also other 
ways to sample subjects; for example, stratified samples may be employed where the 
population is divided into strata and then subjects are randomly selected from each 
stratum. These sampling methods are primarily used in surveys where one is inter¬ 
ested in investigating, for example, how many people sympathize with a political 
party or the extent of positive attitudes toward environmental issues. An application 
of statistics is thus to determine how the sampling should be done and, not least, how 
many people are needed in the study. 

After the data are collected, the next step is to describe the results. There are 
many possibilities, depending on the problem and what types of data have been col¬ 
lected. The results may be presented in terms of percentages, rates, means, or per¬ 
haps a measure of association (correlation). Graphs or figures for summarizing the 
data could also be used. 

The third and last step is the deduction from the sample to the population. 
Researchers are usually not satisfied with just describing the specific sample, but 
rather want to draw conclusions about the entire population. Conclusions about the 
population are based on findings observed in a sample. One way to do this is through 
hypothesis testing. This means that a hypothesis about the population is formulated 
that can be tested using results from the sample. 

The second procedure involves estimating the results in the population on the 
basis of the results in the sample. Suppose we are interested in knowing the propor¬ 
tion of people with fear of flying. After conducting a study measuring this attitude 
about flying, we will have a concrete number: the proportion of our subjects who said 
they were afraid of flying. Lacking any better estimates, it would then be reasonable 
to suggest that, among the population at large, approximately the same proportion as 
we have observed in the sample has a fear of flying. In addition, we might propose 
an interval that is likely to capture the true proportion of people with fear of flying 
in the population. These intervals are called confidence intervals. If many people are 
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included in the sample, then these intervals are smaller; that is, the more people who 
are in our sample, the more accurate is our estimate. 

2.8.1 Descriptive Statistics 

The most common measure of central tendency is the arithmetic mean. This is 
commonly used when something is measured on a continuous scale. The arithmetic 
mean is usually denoted with the letter M (for “mean”) or X. An alternative is the 
median, which is the value in the middle, after all the values have been ranked 
from lowest to highest. This is a good measure if the distribution is skewed—for 
example, if the results include some very high or very low values. The mode is a 
third indicator of central tendency, which simply is the value with the highest fre¬ 
quency. Thus, it is not necessary that the variable be continuous to use this measure 
of central tendency. 

In addition to a measure of the most typical value in the distribution, it is also 
important to have a measure of variation. If we have calculated the arithmetic mean, 
it is common to use the standard deviation as a measure of variation. Formulated a 
little imprecisely, the standard deviation is the average deviation from the mean. If 
results are normally distributed—that is, bell shaped—and we inspect the distribu¬ 
tion and move one standard deviation above and one standard deviation below the 
average, then about two-thirds of the observations fall within that range. Including 
two standard deviations on both sides of the mean, then about 95% of the observa¬ 
tions will be included. The formulas for calculating the arithmetic mean and stan¬ 
dard deviation are 
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where 

N = sample size 
X t = test score 
X = mean test score 

As an example, we have created a simulated data set for 10 people (Table 2.1). 
Assume that people have answered a question in which they are asked to specify on 
a scale from 1 to 5 the degree of fear of flying. In this case, the higher the score is, 
the greater is the discomfort. In addition, age and gender are recorded. Remember 
that these data have been fabricated. Suppose we are interested in describing the 
group in terms of demographic variables and the level of fear of flying. Both age 
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TABLE 2.1 

Constructed Data Set 


Person 

Fear of Flying 

Age 

Sex 

1 

5 

30 

Female 

2 

4 

22 

Female 

3 

4 

28 

Male 

4 

3 

19 

Male 

5 

3 

20 

Female 

6 

2 

21 

Female 

7 

2 

22 

Male 

8 

1 

24 

Male 

9 

1 

23 

Male 

10 

1 

21 

Male 


and fear can be said to be continuous variables, and then the arithmetic mean and 
standard deviation are appropriate measures of central tendency and variation. If a 
statistical program is used to analyze the data, each person must be represented as 
a line in the program. The columns represent the different variables in the study. 
If a widely used statistics program in the social sciences and medicine (Statistical 
Package for the Social Sciences/SPSS) is applied, the output would look like the one 
in Table 2.2. 

The figures in italics are the computed means and standard deviations for fear 
of flying and age, respectively. In addition, the minimum and maximum scores for 
each of the variables are presented. For the variable gender, it is not appropriate to 


TABLE 2.2 
SPSS Output 


Descriptive Statistics 


Std. 



N 

Range 

Min. 

Max. 

Sum 

Mean 

Deviation 

Fear of flying 

10 

4.00 

1.00 

5.00 

26.00 

2.6000 

1.42984 

Age 

10 

11.00 

19.00 

30.00 

230.00 

23.0000 

3.49603 

Valid N 

10 







(listwise) 



Sex 








Valid 

Cumulative 





Frequency 

% 

% 

% 




Valid F 

4 

40.0 

40.0 

40.0 




M 

6 

60.0 

60.0 

100.0 




Total 

10 

100.0 

100.0 
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calculate the mean score, and the best way to describe gender would be to specify 
the number of women and men included in the study. If the sample is large, it would 
be wise to give this as a percentage. 

The researcher may also be interested in exploring the relationship between 
age and fear of flying. Because both variables are continuous, the Pearson prod¬ 
uct-moment correlation may be used as an index of association. The correlation 
coefficient indicates the strength of the relationship between two variables. It is a 
standardized measure that varies between -1.0 and +1.0. The number indicates the 
strength of the relationship and the sign indicates the direction. A correlation of 0 
means that there is no association between the variables, while a correlation of -1.0 
or 1.0 indicates a perfect correlation between variables in which all points form a 
straight line. Most correlations that we observe between two psychological variables 
will be considerably lower than 1.0. (See Table 2.4 for a description of what can be 
labeled a small, medium, or high correlation.) 

Whenever a correlation is positive, it means that high values on one variable are 
associated with high values on the other variable. A negative correlation means that 
high values on one variable are associated with low values for the other variable. 
Correlation actually describes the extent to which the data approximate a straight 
line, and this means that if there are curvilinear relationships, the correlation coef¬ 
ficient is not an appropriate index. It is therefore wise to plot the data set before 
performing the calculations. 

The formula for calculating the product-moment correlation is 

N 

r= ' 1 (N — l)S x S y 


where 

S x , S y are the standard deviations of the two variables 

X and Y are mean scores 

N is sample size 

In the example in Figure 2.2, the plot indicates a tendency for higher age to be 
associated with a higher score in fear of flying (i.e., a positive correlation). According 
to the output, the observed correlation is .51, which is a strong correlation between 
the variables. In addition, a significance test of the correlation is performed and 
reported in the output, but we will return to this later. 

Sometimes we want to examine in more detail the nature of the relationship 
between the two variables beyond the strength of the relationship. For example, 
we may want to know how much fear of flying would increase when increasing 
the age by 1 year (or perhaps 10 years). Using a method called regression analysis, 
we can calculate an equation for the relationship between age and fear of flying 
that specifies the nature of the relationship. Such a regression equation may have 
one or more independent variables. The purpose of the regression analysis is to 
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Scatterplot 



18.00 20.00 22.00 24.00 26.00 28.00 30.00 

Age 

Correlations 



FIGURE 2.2 Correlation: scatter plot and calculations. 

predict as much as possible of the variation in the dependent variable, in this case 
fear of flying. 

The correlation coefficient is the starting point for a series of analyses—among 
other things, factor analysis. The purpose of factor analysis is to find a smaller num¬ 
ber of factors that explain the pattern of correlations among many variables. Suppose 
we have tested a group of people with a range of ability tests that are then corre¬ 
lated with each other. Some of the tests will be highly correlated with each other 
while others will be less correlated. Using factor analysis, it is possible to arrive at a 
smaller number of factors than the number of tests that will explain the pattern in the 
correlation matrix. Perhaps the tests can be grouped into two groups: one including 
tests that measure mathematical abilities and one consisting of tests that measure 
verbal abilities. The results of the factor analysis may be the identification of two 
underlying factors that might be named after tests that are intertwined with each of 
them. Here one could perhaps say that the tests measure the two abilities—namely, 
linguistic and mathematical skills. 

There are two main strategies when performing a factor analysis: exploratory 
factor analysis and confirmatory factor analysis. In the first case, a factor structure 
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is suggested by the statistical program based on the specific intercorrelations in the 
data set. In confirmatory factor analysis, the researcher suggests the number of fac¬ 
tors and the structure among these. The model is then compared to the actual data 
and the correspondence between model and data is examined. 

Returning to our data set, we may want to determine whether any gender differ¬ 
ences exist. Based on the SPSS output in Table 2.3, we see that women score higher 
on fear of flying compared to men in this group, although the standard deviations 
are very similar. Sometimes it can be difficult to evaluate whether such a difference 
can be said to be big or small. This obviously depends on how well one knows the 
scale used. An alternative is to transform the difference between the means to a more 
familiar scale—for example, in the form of a standard deviation. These standardized 
scores are called effect sizes (ES) and are calculated as ES = (X- Y)/SD poolei . 

In this calculation, the difference between means is divided by a pooled standard 
deviation. The standard deviations in this example are approximately equal for the 
groups and we can use SD poolei = 1.3. ES is then (3.5 - 2.01/1.3 = 1.15. This can be 
described as a large difference, and it means that the difference between women and 
men is slightly more than one standard deviation. In addition, the confidence interval 
for the difference is large, ranging from -0.398 to 3.398. The confidence interval is 


TABLE 2.3 

SPSS Output: t-Test of Differences between Means 


Fear of 
flying 


Group Statistics 


Sex 

N 

Mean 

Std. Deviation 

Std. Error Mean 

Female 

4 

3.5000 

1.29099 

.64550 

Male 

6 

2.0000 

1.26491 

.51640 


Fear of 
Flying 

Equal 

variances 

assumed 

Equal 

variances 

not 

assumed 


Independent Samples Test 

t-Test for Equality of Means 


Levene's Test 
for Equality of 
Variances 


95% Confidence 
Interval of the 
Difference 






Std. 





Sig. 

Mean 

Error 





(two- 

Differ¬ 

Differ¬ 



F Sig. 

t 

df tailed) 

ence 

ence 

Lower 

Upper 

.000 1.000 

1.823 

8 .106 

1.50000 

.82285 

-.39750 

3.39750 


1.815 

6.477 .116 

1.50000 

.82664 

-.48715 

3.48715 
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estimated to capture the actual difference between women and men in the population 
with a high certainty. 

The range also includes zero, which means that we cannot rule out the possibility 
that the difference between women and men in the population is zero. 

2.8.2 Inferential Statistics 

We have now described the variables and also the associations between some of 
these variables for the sample. The next step would be to draw conclusions about the 
population from which these individuals were randomly picked. Several interesting 
hypotheses can be tested. For one, we found a positive correlation between age and 
fear of flying. The question is: Will there also be a correlation in the population? 

This question is restated in the form of a null hypothesis (H 0 ). The null hypoth¬ 
esis in this example is that the correlation is zero in the population. The alternative 
hypothesis (1^) would be that the correlation is greater than zero or less than zero. It 
is now possible to calculate a number (a test statistic) that will help us decide whether 
our current results support the null hypothesis or the alternative hypothesis. 

In Figure 2.2, the results from this significance test are presented. The number in 
Figure 2.2 indicates the probability of observing the correlation we have obtained, 
or a stronger correlation when the true correlation in the population is zero (i.e., H 0 
is true). The probability in this case is .13, and this is higher than what we usually 
would accept. As a rule, the calculated probability should be lower than .05 and 
preferably .01. This limit is chosen in advance and is called the significance level. By 
choosing a given significance level, we decide on how “strict” we will be. To put it in 
a slightly different way: How much evidence is required before the null hypothesis 
can be discarded? In this example, we cannot reject the null hypothesis because .13 
> .05. It is important to remember that the sample in this case is very small (only 10 
people); for most purposes, this is too small a sample when conducting a study. 

If we want to test whether the difference between women and men is significant, 
we also need to formulate a null hypothesis. The null hypothesis in this case will be 
that the difference between the mean scores for women and men is zero in the popu¬ 
lation (H 0 : mean score for women in population-mean score for men in the popula¬ 
tion = 0). This is another way of saying that the two means are equal. In addition, 
we need an alternative hypothesis if the null hypothesis is rejected. This hypothesis 
would be that the difference between the means is different from zero. 

Sometimes we choose a specific direction for the alternative hypothesis based on 
previous findings or theory. Suppose previous studies have shown that women score 
higher than males on measures of anxiety and depression. We could then formulate 
an alternative hypothesis that indicates that the difference is greater than zero (Hp 
mean score for women in population-mean score for men in the population > 0). In 
this case, a so-called one-tailed significance test could be performed; if we do not 
have a specific hypothesis in advance, both options would have to be specified, and 
this is called a two-tailed test. This represents a more conservative approach and is 
therefore used more frequently. Most statistics programs therefore use a two-tailed 
test as the default option. 
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An example of a significance test is presented in Table 2.3. In the top part of the 
table, the mean scores for men and women are presented. For women, it is 3.5; men 
score lower and achieve 2.0. Is this difference large enough for the result to be signifi¬ 
cant? In Table 2.3, a t-value (t = 1.82) and the corresponding probability (.106) are pre¬ 
sented. The calculations are based on the assumption that the difference really is zero 
in the population (H 0 is true). This is a higher probability than is commonly used as a 
limit (e.g., .05), and thus the null hypothesis cannot be rejected. A prerequisite for such 
tests, at least if the samples are small, is that the variations in the two groups are equal. 
The table also shows a test that can be used if this assumption does not hold. 

There are a number of different significance tests depending on the hypotheses 
and how data are collected. In our example, the appropriate test is called a t-test 
for independent data because two independent groups were examined. If a different 
design had been used and, for example, the same person had been examined twice, 
a slightly different version of this significance test is appropriate (dependent sam¬ 
ples t-test). When several groups are investigated, there will be many group means. 
Comparing multiple group means can be done with analysis of variance (ANOVA). 
The basic principles involved in the significance testing, however, are the same: 

• Formulate H 0 and H,. 

• Select the significance level (.01 or .05). 

• Perform calculations. What is the probability of getting the results if H 0 
is true? 

• Conclude. Can H 0 be rejected? 

2.8.3 Type 1 and Type 2 Errors 

When a significance test is conducted, several decisions can be made—two cor¬ 
rect and two incorrect. Correct decisions are to reject the null hypothesis when it is 
wrong and keep it when it is correct. Wrong decisions are to retain the null hypoth¬ 
esis when it is incorrect (type 2 error) or reject it when it is correct (type 1 error). The 
probability of making a type 1 error is set by the significance level. If the researcher 
wants to be very confident that such a mistake is avoided, a more stringent signifi¬ 
cance level should be chosen. To avoid type 2 error, it is important that the study be 
conducted with a sufficient number of people so that true correlations or differences 
between groups are discovered. 

Somewhat imprecisely, one can say that significance is a function of both the 
effect size and the sample size. This means that if we study a very strong association 
(high correlation), a small sample may be sufficient. If the relationship is weaker (a 
smaller correlation), a much larger sample is needed to detect this. It can sometimes 
be difficult to determine the necessary sample before the survey is conducted if 
the size of the effect is unknown. If similar studies have already been published, 
these results can be used to calculate the sample size needed. In Table 2.4, some 
calculations have been performed for small, medium, and large effects in order to 
estimate the sample sizes needed. It is, of course, desirable that the study should have 
sufficient statistical power. This means that the probability should be high (e.g., .80) 
for correctly rejecting the null hypothesis. 
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TABLE 2.4 




Required Sample Size 

Effect 

N (One-Tailed Test) 

N (Two-Tailed Test) 



Correlation 


Small effect 

r = .10 

600 

770 

Medium effect 

r = .30 

60 

81 

Large effect 

r = .50 

20 

25 


Differences Between Two Groups 


Small effect 

ES = .20 

310 (in each group) 

390 (in each group) 

Medium effect 

ES = .50 

50 (in each group) 

63 (in each group) 

Large effect 

ES = .80 

20 (in each group) 

25 (in each group) 


Notes: Significance level = .05; statistical power = .80. 


Different programs can be used to perform such calculations; the calculations in 
Table 2.4 are performed with a program called “Power and Precision” (Borenstein, 
Rothstein, and Cohen 2001). When the calculations are performed, it is necessary 
to specify the size of the effect, significance level (one-tailed or two-tailed test), and 
the desired statistical power. 

2.9 DESIGN AND VALIDITY 

Every research design will have advantages and disadvantages, and the researcher 
needs to consider every option and possible methodological problem carefully 
before choosing the study design. In addition, practical, ethical, and economic 
factors will have to be considered. An important aspect when choosing design is 
the impact this will have on validity. A very short and simple explanation of valid¬ 
ity is that it relates to how well we can trust the conclusions that are drawn from 
a study. 

There are different forms of validity related to study design, such as statistical 
validity, internal validity, external validity, and construct validity. In short, statis¬ 
tical validity is related to obtaining a significant effect, internal validity is about 
causality, construct validity has to do with measuring constructs, and external valid¬ 
ity has to do with generalizing findings over time, places, and people. This validity 
system was first introduced by Cook and Campbell in 1979; a revised edition of their 
classic book was published in 2002 (Shadish, Cook, and Campbell 2003). The sys¬ 
tem is particularly intended for research addressing causal problems, although some 
types of validity may be relevant to more descriptive research. 

2.9.1 Statistical Validity 

This type of validity relates to the extent to which we can trust the statistical conclu¬ 
sions from a study. Is there a significant difference between the two groups studied 
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(experimental and control groups), and is the difference of a certain magnitude? To 
investigate this, we usually perform a significance test. There are several threats to 
the statistical validity, such as low statistical power, which typically results from hav¬ 
ing too few participants in the study. If a statistically significant difference between 
the groups is detected, it is possible to move on to discuss other forms of validity. 
Thus, statistical validity is a prerequisite for the other types of validity. If the experi¬ 
mental and control groups show no significant differences, then there is little point in 
discussing whether the findings can be generalized to other subjects. 

2.9.2 Internal Validity 

Internal validity has to do with causality. Is it possible to draw firm conclusions 
about cause and effect based on the study? The best way to ensure this is to have 
an experimental design with a control group. Participants are distributed randomly 
between the two conditions; this allows us to conclude that any difference observed 
between the groups can be attributed to the treatment. If a control group is not used 
and instead a pretest/posttest design is used (participants are tested before and 
after the course), the internal validity is threatened because one cannot exclude 
other causes that may have produced or caused the change. It is possible that at 
the same time that they invited the employees to participate in the course, the 
airline also implemented other changes that contributed to the observed effect. 
The longer the time period between the two test periods is, the more likely it is 
that other things can occur and produce a change. There may be situations where 
a pretest/posttest design can be justified—for example, as a preliminary study of a 
new intervention before the intervention is implemented and examined in a larger 
experimental study. 

Even though a true experiment is optimal, it may sometimes be possible to use 
other designs—so-called quasi-experimental designs—to conduct valid research. 

2.9.3 Construct Validity 

If both the statistical and internal validity have been addressed, it is possible to exam¬ 
ine construct validity and external validity. Construct validity addresses whether 
one can generalize the relationship between cause and effect from the measured 
variables to the constructs. Suppose that we have implemented a CRM course for 
air traffic controllers, and the researcher is interested to see whether the course has 
increased job engagement in this group. An experimental design is used and the 
researcher has detected a significant difference in job engagement between the two 
groups after the course is completed. Both statistical and internal validity are con¬ 
sidered to be sufficient and no major threats to internal and statistical validity have 
been detected. The question is then whether the findings can be generalized to the 
constructs—that is, to CRM courses in general and to the construct of job engage¬ 
ment, which would involve that the specific measurement instrument actually mea¬ 
sured job engagement. 

In addition, we must ensure that the CRM course has been conducted according 
to the course plan and with the specified content because we want to conclude that 
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the specific training of the CRM course gives the effect, rather than just any course. 
Many interventions, such as a course, could include many effective ingredients. To 
investigate this further, we could also evaluate other training courses that focus only 
on safety in aviation and do not have a focus on collaboration and communication, 
such as most CRM courses do. If both types of courses are effective in relation to 
increasing the engagement, then the specific academic content in the CRM course 
is not the effective ingredient, but rather attending a course in general. Another pos¬ 
sible outcome is that there is a difference between the two courses and that the CRM 
group had the highest score on engagement, and the control group and the other 
group (safety course) scored lower on engagement. Then it would be safe to assume 
that the specific CRM course is the effective treatment, rather than just any course. 

Construct validity in relation to study design is similar to construct validity in 
testing. The difference is that test validity is usually related to how well one con¬ 
struct is measured. However, when we talk about construct validity in study design, 
it is usually two (or more) constructs that are operationalized. We want to be able 
to make conclusions regarding the causal relationship between them rather than the 
measurement of one construct. 

2.9.4 External Validity 

External validity is the extent to which we can generalize the results to other groups, 
to other situations, and over time. For example, will the effect of participating in a 
course last over time? Could the course be applied in a different airline and perhaps 
for other professions? Often, not all of these questions can be answered in a single 
study, for a variety of reasons: 

The study may have been limited to one group. 

The study may have been conducted in a single organization or country. 

There are limits to how long participants may be followed after the study 
is completed. 

The experiment may have been conducted in a laboratory setting. 

An example would be determining how long it takes for a group of people to 
evacuate an airplane cabin under two conditions (high or low reward). If we observe 
differences between the groups, the question is whether this difference will also 
apply in a critical situation with actual passengers. Probably, some factors will be the 
same, but there will also be differences. It is a paradox that if one is using a design 
where good internal validity is ensured (i.e., a randomized controlled experiment in 
a laboratory setting), then the external validity may suffer (i.e., it may be more dif¬ 
ficult to generalize the findings to a real-life situation). In other words, a design will 
therefore often represent a compromise between different concerns, and it is rare that 
a design is optimal in relation to all forms of validity. 

In order to document that the findings can be generalized to different settings, to 
different persons, and over time, it is necessary to perform many studies where these 
aspects are varied. An effective way to systematize and compare several studies is 
through a meta-analysis. 
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2.10 META-ANALYSIS 

Meta-analysis involves using statistical techniques to combine results from several 
studies that address the same problem. Suppose a researcher is interested in exam¬ 
ining whether psychological tests measuring spatial abilities can be used to select 
pilots. How can this problem be investigated? One solution is to perform a so-called 
primary study. This means that a study that examines this problem is conducted. For 
example, a group seeking flight training is tested using a measure of spatial ability, 
and then later these results are combined with information on performance. If the 
research hypothesis is correct, then those with the highest test scores would receive 
the highest ratings by the instructor. 

Another possibility would be to review other studies that have previously exam¬ 
ined this issue. For most topics, published studies addressing the research ques¬ 
tion range from a handful to many hundreds of studies. If the number of studies 
is large, it can be difficult to get a clear picture of the overall results. In addition, 
studies probably vary in terms of samples used and measurement instruments, and 
it may therefore be difficult to summarize everything simply by reading through 
the articles. 

An alternative to a narrative review of the studies would be to do a meta-anal- 
ysis. In this approach, all the studies are coded and an overall measure of effect is 
recorded (such as ES or r) from each article or report. In this example, it will prob¬ 
ably be a correlation between test results and instructor ratings. The meta-analysis 
would involve calculating a mean correlation based on all studies, and the next step 
would be to study variation between studies. In some studies, a strong correlation 
may be detected, but in others no correlation between test results and performance 
will be found. 

A meta-analysis consists of several steps similar to the research process in a pri¬ 
mary study: 

• formulating a research problem; 

• locating studies; 

• coding studies; 

• meta-analysis calculations; and 

• presenting the results. 

2.10.1 Literature Search and Coding of Articles 

Locating studies for a meta-analysis usually starts with a literature search using 
available electronic databases. This can be Psychlnfo, which includes most of the 
articles published in psychology, or Medline, which contains the medical literature. 
They are many different databases depending on the discipline in which one is inter¬ 
ested. It is thus important to use the right keywords so that all relevant studies are 
retrieved. In addition to searching in these databases, studies that are published in 
the form of technical reports or conference presentations may also be used. Often 
these can be found on the Web—for example, via the Web sites of relevant organiza¬ 
tions. In addition, reference lists of articles that have already been obtained could 
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provide further studies. Basically, as many studies as possible should be collected, 
but it is, of course, possible to limit the meta-analysis to recent studies or where a 
particular occupational group is examined. 

Then the work of developing a coding form starts, and the form should include 
all relevant information from the primary studies. Information about the selection 
of participants, age, gender, effects, measurement instruments, reliability, and other 
study characteristics may be included. The coding phase is usually the most time- 
consuming part of a meta-analysis; if many studies need to be encoded, more than 
one coder is usually needed. Coder reliability should be estimated by having a sam¬ 
ple of studies coded by two or more coders. 

2.10.2 Statistical Sources of Error in Studies 
and Meta-Analysis Calculations 

Hunter and Schmidt (2003) have described a number of factors or circumstances that 
may affect the size of the observed correlation (or effect size). One such factor is the 
lack of reliability in measurements. This will cause the observed correlations to be 
lower than if measurements had been more reliable. These statistical error sources 
or artifacts will cause us to observe differences between the studies in addition to 
variations caused by sampling error. When a meta-analysis is conducted, the effect 
sizes should be corrected for statistical errors; in addition, the observed variance 
between studies should be corrected for sampling error. Some of these error sources 
will be addressed in greater detail in Chapter 4 on selection. For simplicity, we will 
further limit this presentation to a “hare-bones” meta-analysis where only sampling 
error is taken into account. 

The average effect size is calculated as a sample size weighted mean. Thus, stud¬ 
ies based on a large number of people are given more weight than those based on 
smaller samples. The formula for calculating the mean effect size is 



where 

N = sample size 
r = correlation 
ES = effect size 

True variance between the studies is calculated as the difference between the 
observed variance between effect sizes and the variance due to sampling error—that 
is, random errors caused by studying a sample and not the entire population. 
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The formula for estimating the population variance between the studies is 


_2 _ 2 2 
cr„ = a, — a. 


where 

a 2 p = population variance between correlations 
a 2 ,. = observed variance between correlations 
a 2 e = variance due to sampling error 

2.10.3 Meta-Analysis Example 

Suppose a researcher wants to sum up studies examining the relationship between an 
ability test and job performance for air traffic controllers. Suppose 17 studies report¬ 
ing correlations and the corresponding sample sizes are available (see Table 2.5). 
The data set is fictitious, but not completely unrealistic. The most important part will 
be to calculate the mean weighted correlation as a measure of the test’s predictive 
validity. In addition, it will be interesting to know whether there is some variation 
between studies or, to put it in a slightly different way: To what extent can the predic¬ 
tive validity be generalized across studies? 

The calculations (presented in Figure 2.3) show that the average validity is .35 
(unweighted), while a sample size weighted mean is slightly lower (.30). This means 
that there is a negative correlation between sample size and the correlation, which 


TABLE 2.5 

Data Set with Correlations 


udy 

N 

r 

1 

129 

.22 

2 

55 

.55 

3 

37 

.44 

4 

115 

.20 

5 

24 

.47 

6 

34 

.40 

7 

170 

.22 

8 

49 

.40 

9 

131 

.20 

10 

88 

.37 

11 

59 

.28 

12 

95 

.26 

13 

80 

.42 

14 

115 

.25 

15 

47 

.38 

16 

30 

.40 

17 

44 

.50 
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Bare-bones meta-analysis calculations: 


Number of substudies 

17 

Number of correlations 

17 

Mean number of cases in sub-studies 

76 

Mean number of cases in corr. 

76 

Total number of cases 

1302 

R Mean (Weighted) 

0.30260 

R Mean (Simple) 

0.35059 


Observed variance 
Std. dev of 


: 0.01110 
: 0.10538 


r Sampling variance 

: 0.01092 

\ 

v Std. dev of 

: 0.10449 

_/ 


Lower endpoint 95% CRI 

: 0.27580 

\ 

Upper endpoint 95% CRI 

: 0.32939 


^ Credibility value 90% 

: 0.28510 

_y 


Population variance 

: 0.00019 

\ 

v Std. dev of 

: 0.01367 

y 


Percentage of observed variance accounted for by 
sampling error : 98.32% 


95% confidence interval (Homogeneous case) 
Lower endpoint : 0.5293 

^ Upper endpoint_: 0.35227 


95% confidence interval (Heterogeneous case) 
Lower endpoint : 0.25250 

Upper endpoint : 0.35269 

Analysis performed with Metados 
(Martinussen and Fjukstad, 1995) 


FIGURE 2.3 Bare-bones meta-analysis calculations. 

implies that studies with small sample sizes have somewhat higher correlations than 
those with higher n. If we correct the observed variance for sampling error, the 
remaining variance almost equals zero (based on the output in Figure 2.3: 0.01110 - 
0.01092 = 0.00019). This means that there is no true variation between studies. Thus, 
the average correlation is a good measure of the predictive validity of this test. 


2.10.4 Criticism of the Meta-Analysis Method 

Currently, the several traditions within meta-analysis employ somewhat different tech¬ 
niques to sum up and compare studies. The main difference concerns how the various 
studies are compared and whether inferential statistics are used to examine differences 
between studies. The alternative to this is the procedure suggested by Hunter and 
Schmidt (2003) in which emphasis is put on estimating variance between studies. 
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A problem that has been raised in relation to meta-analysis is whether the published 
studies can be said to be a representative sample of all studies conducted. Perhaps 
there is a bias in which published studies are systematically different from unpub¬ 
lished studies. It is reasonable to assume that studies with significant findings have a 
higher likelihood of being published than studies with no significant results. Because 
a meta-analysis is largely based on published studies, it is reasonable to assume that 
we overestimate the actual effect to some extent. Statistical methods can calculate the 
number of undiscovered studies with null effects that would have to exist in order to 
reduce the mean effect to a nonsignificant level. If this number is very large (e.g., 10 
times the number of retrieved studies), then it is unlikely that such a large number of 
undiscovered studies would exist, and the overall effect can be trusted. 

Another criticism of meta-analysis has been that studies that are not comparable 
are summarized together or that categories that are too global are used for the effects. 
The research problem should be considered when making coding decisions includ¬ 
ing which categories and methods should be used. For example, if we are interested 
in the effect of therapy in reducing PTSD (posttraumatic stress disorder), then comb¬ 
ing different types of treatment would make sense; however, if the researcher is 
interested in discovering whether cognitive behavioral therapy works better than, 
for example, group therapy, then obviously categories for each therapy form will be 
needed. This problem is known in the literature as the “apple and orange” problem. 
Whether it is a good idea to combine apples and oranges depends on the purpose. If 
one wants to make fruit salad, it can be a very good idea; on the other hand, if one 
only likes apples, the oranges are best avoided. 

2.11 RESEARCH ETHICS 

Research has to comply with many rules and regulations in addition to scientific stan¬ 
dards—for example, research ethics. Research ethics include how participants are treated 
and the relationship between the researcher and other researchers, as well as relationships 
with the public. International conventions, as well as national regulations and laws, gov¬ 
ern research ethics. Each country and sometimes even large organizations have their own 
ethics committees where all projects are evaluated. One such international agreement 
is the Helsinki Declaration, which includes biomedical research conducted on humans. 
According to the declaration, research should be conducted in accordance with recog¬ 
nized scientific principles by a person with research competence (e.g., a PhD), and the 
subjects’ welfare, integrity, and right to privacy must be safeguarded. Informed consent 
to participate in the study should be obtained from the subjects before the study begins. 

In general, there is substantial agreement on the basic principles that should gov¬ 
ern research, although the formal approval procedures to which projects are sub¬ 
jected may vary slightly from country to country. People who participate in research 
projects should be exposed to as little discomfort or pain as possible, and this must 
be carefully weighed against the potential benefits of the research. These two per¬ 
spectives—society’s need for knowledge and the welfare of the participants—need 
to be balanced. Participation should be voluntary, and special care needs to be taken 
when people are in a vulnerable position or in a special position in relation to the 
researcher (e.g., a subordinate or a client). 
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An important principle in relation to research ethics is informed consent. This 
means that the person asked to participate should have information about the pur¬ 
pose of the study, the methods to be used, and what it will involve in terms of time, 
discomfort, and other factors to participate in the study. The person should also 
receive information about opportunities for feedback and who can be contacted if 
additional information is needed. It should also be emphasized that participation is 
voluntary and that information is treated confidentially. For many studies involving 
an experiment or an interview, it is common for people to sign a consent form before 
the study begins. If the study is an anonymous survey, then a consent form is not 
usually attached; instead, the person gives consent to participate implicitly by sub¬ 
mitting the questionnaire. The person should also receive information that he or she 
may at any time withdraw from the study and have his or her data deleted. 

In some studies, perhaps especially from social psychology, subjects have been 
deceived about the real purpose of the study. An example is experiments where one 
or more of the research assistants act as participants in the experiment; the purpose 
is to study how the test subjects are influenced by what other people say or do. The 
most famous experiment in which subjects were deliberately misled about the true 
purpose of the experiment was the Milgram studies on obedience conducted in the 
1960s. This study was presented as an experiment in learning and memory, but it was 
really an experiment to study obedience. Subjects were asked to punish with electri¬ 
cal shocks a person who in reality was a research assistant. Many continued to give 
electric shock even after the person screamed for help. Many of the subjects reacted 
with different stress responses and showed obvious discomfort in the situation, but 
nevertheless continued to give shocks. 

These studies violate several of the ethical principles outlined in this section. 
These include lack of informed consent, the fact that people were pressured to con¬ 
tinue even after they indicated that they no longer wanted to participate, and exposing 
people to significant discomfort even though they were informed about the purpose 
of the study afterward. 

Such experiments would probably not be approved today, and a researcher would 
need to make a strong argument for why it would be necessary to deceive people 
on purpose. If the researcher did not inform the participants about the whole pur¬ 
pose of the project or withheld some information, the participants would need to be 
debriefed afterward. A common procedure in pharmaceutical testing is to provide a 
group of people with the drug while the other group (control) receives a placebo (i.e., 
tablets without active substances). Such clinical trials would be difficult to perform 
if the subjects were informed in advance about their group assignment. Instead, it 
is common to inform the subjects that they will be in the experimental group or in 
the placebo group and that they will not be informed about the group to which they 
belonged until the experiment ended. 

2.12 CHEATING AND FRAUD IN RESEARCH 

Research dishonesty can take many forms. One of the most serious forms of fraud 
is tampering with or direct fabrication of data. Several examples have been pub¬ 
lished in the media, both from psychology and other disciplines where scientists have 
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constructed all or parts of their data. Another form of dishonesty is to withhold parts 
of a data set and selectively present the data that fit with the hypothesis. Other types of 
dishonest behavior include stealing other researchers’ ideas or text without quoting or 
acknowledging the source, or presenting misleading representations of others’ results. 

It should be possible to verify a researcher’s results, so the raw data should be 
stored for at least 10 years after the article has been published, and the data should 
be made available to others in the event of any doubt about the findings. Many coun¬ 
tries have permanent committees that will investigate fraud and academic dishon¬ 
esty whenever needed. 

2.13 SUMMARY 

During the Crimean War, far more British soldiers died in field hospitals than on 
the battlefield due to various infections and poor hygienic conditions. Florence 
Nightingale discovered these problems and implemented several reforms to improve 
health conditions in the field hospitals. To convince the health authorities about the 
benefits of hygiene interventions, she used statistics, including pie charts, to dem¬ 
onstrate how the mortality rates of the hospitals changed under different conditions 
(Cohen 1984). She also demonstrated how social phenomena could be objectively 
measured and analyzed and that statistics were important tools to make a convincing 
argument for hospital reforms. 

The main topic of this chapter has been related to how we can gain new knowl¬ 
edge and about important research requirements concerning methods, design, analy¬ 
ses, and conclusions. Research methods and statistics are important tools when a 
phenomenon is investigated and the results presented to others. Without methodol¬ 
ogy and statistics, it would be very difficult to present a convincing argument for the 
statements one wished to make, and the example set by the Florence Nightingale is 
still valid today. 
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Q Aviation Psychology, 
Human Factors, 
and the Design of 
Aviation Systems 


This, indeed, is the historical imperative of human factors—understanding why people 
do what they do so we can tweak, change the world in which they work and shape their 
assessments and actions accordingly. 

Dekker (2003, p. 3) 


3.1 INTRODUCTION 

As noted in the introduction in Chapter 1, aviation psychology is closely related to 
the field known as human factors. In recent years the distinction among aviation psy¬ 
chology, human factors, and the more hardware-oriented discipline of engineering 
psychology has become very blurred, with practitioners claiming allegiance to the 
disciplines performing very similar research and applying their knowledge in very 
similar ways. Traditionally, engineering psychology might be thought of as focusing 
more on humans and human factors might focus somewhat more on hardware and its 
interface with the human operator. For all practical purposes, however, the distinc¬ 
tion between the two disciplines is irrelevant. It is mentioned here only to alert the 
reader to the terminology because much of what we would label as aviation psychol¬ 
ogy is published in books and journals labeled as human factors. 

Setting aside the differences in terminology, aviation psychology (or human fac¬ 
tors) has a great deal to say about how aviation systems should be designed. To 
meet the goals of reducing errors, improving performance, and enhancing comfort, a 
system must accommodate the physical, sensory, cognitive, and psychological char¬ 
acteristics of the operator. A system must not demand that the operator lift excessive 
weights or press a control with an impossible amount of force. A system must not 
require that the operator read information written in a tiny font or make fine distinc¬ 
tions of sound when operating in a noisy environment. A system must not demand 
complex mental arithmetic or the memorization and perfect recall of long lists of 
control settings, dial readings, and procedures. A system must not demand that the 
operator remain immune to the social stresses placed on him or her by co-workers or 
to the demands of management to cut corners to accomplish the job. 

Knowledge of human capabilities, strengths, and limitations informs the sys¬ 
tem design process because this knowledge sets the bounds for the demands the 
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system may make of the operator. An extensive body of research has addressed these 
bounds. Researchers have studied how much weight humans can lift to specified 
heights, the numbers of errors that occur when identical controls are placed side by 
side, how many numbers can be recalled from short-term memory, the font size of 
displays, legibility of displays under varying degrees of illumination, and the effects 
of an organization’s climate on the safety-related behavior of workers, to list but a 
few examples. The overall aim of this chapter is to demonstrate how psychological 
knowledge may be used when designing aviation systems, what principles should 
be applied, and common errors and problems that occur when humans interact with 
complex systems and equipment. 

3.2 TYPES OF HUMAN ERROR 

Arguably, the present status of aviation psychology and human factors owes much 
to the efforts of researchers during World War II. The sheer magnitude of the war 
effort led researchers on both sides of the conflict to conduct extensive studies with 
the aim of improving personnel performance and reducing losses due to accidents 
and combat. Perhaps the most frequently cited study in the area of aviation psychol¬ 
ogy and human factors produced by that era was the work by Fitts and Jones (1947a) 
on the causes of errors among pilots. 

Fitts and Jones (1947a, 1961a) surveyed a large number of U.S. Army Air 
Force pilots regarding instances in which they committed or observed an error 
in the operation of a cockpit control (flight control, engine control, toggle switch, 
selector switch, etc.). They found that all errors could be classified into one of 
six categories: 

• substitution errors—confusing one control with another or failing to iden¬ 
tify a control when it was needed; 

• adjustment errors—operating a control too slowly or too rapidly, moving a 
switch to the wrong position, or following the wrong sequence when operat¬ 
ing several controls; 

• forgetting errors—failing to check, unlock, or use a control at the proper time; 

• reversal errors—moving a control in the direction opposite to that neces¬ 
sary to achieve the desired result; 

• unintentional activation—operating a control inadvertently without being 
aware of it; and 

• unable to reach a control—inability physically to reach a needed control or 
being required to divert attention from an external scan to such a point that 
an accident or near-accident occurred. 

Substitution errors accounted for 50% of all the error descriptions reported; the 
most common types of errors were confusion of throttle quadrant controls (19%), 
confusion of flap and wheel controls (16%), and selection of the wrong engine con¬ 
trol or propeller feathering button (8%). The conditions that gave rise to such results 
are illustrated in Table 3.1. using data provided by Fitts and Jones (1961a, p. 339) on 
the throttle quadrant configurations on three common aircraft of that era. Similar 
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TABLE 3.1 

Aircraft Configurations Leading to Control 
Confusion 


Control Sequence on Throttle Quadrant 


Aircraft 

Left 

Center 

Right 

B-25 

Throttle 

Propeller 

Mixture 

C-47 

Propeller 

Throttle 

Mixture 

C-82 

Mixture 

Throttle 

Propeller 


difficulties were encountered with the controls for the flaps and landing gear, which 
at that time were often located close to one another and used the same knob shape. 

Fortunately for today’s pilots, many of the recommendations of Fitts and Jones 
and other researchers of that period have been implemented. The configuration of 
the six principal instruments, the order of controls on the throttle quadrant for pro¬ 
peller-driven aircraft, and the shapes of the controls themselves are all now fairly 
standardized. The shape of the knob for the landing gear resembles a wheel, the 
shape of the flaps knob resembles an airfoil, and the two controls are located as far 
apart as possible while still remaining easily accessible to the pilot. 

Although these sorts of errors have been largely, though not entirely, eliminated, 
others remain. “Forgetting” errors, which in the Fitts and Jones study accounted for 
18% of the total errors, remain a problem in today’s aircraft. The shape of the land¬ 
ing gear control may have largely prevented its confusion with the flaps; however, the 
pilot must still remember to lower the gear prior to landing. Memory devices, paper 
checklists, and, in the case of more advanced aircraft, computer watchdogs all serve 
to prevent the pilot from making the all-too-human error of forgetting. Interestingly, 
one of the recommendations of Fitts and Jones (1961a, p. 333) was to make it “impos¬ 
sible to start the takeoff run until all vital steps are completed.” Clearly, this is a goal 
that still eludes us: Pilots still attempt takeoffs without first extending the leading- 
edge slats and flaps, and they make landings without prior arming of the spoilers— 
typically, after defeating the warning systems put in place to prevent such events. 

3.3 HUMAN CHARACTERISTICS AND DESIGN 

At a more general level than the work by Fitts and Jones, Sinaiko and Buckley (1957; 1961, 
p. 4) list the following general characteristics of humans as a system component: 

• physical dimensions; 

• capability for data sensing; 

• capability for data processing; 

• capability for motor activity; 

• capability for learning; 

• physical and psychological needs; 

• sensitivities to physical environment; 
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• sensitivities to social environment; 

• coordinated action; and 

• differences among individuals. 

All of these characteristics must be taken into account in the design of aviation 
systems. Some of the system requirements driven by these characteristics are rea¬ 
sonably well understood and have been addressed in system design for many years. 
For example, certainly since the work of Fitts and Jones following World War II, 
designers have been aware of the need to mark and separate controls properly and to 
arrange displays in a consistent way. However, the implications of some of the char¬ 
acteristics are still being explored. The work over the past 20 years on crew resource 
management (see Helmreich, Merritt, and Wilhelm, 1999, for an overview) is evi¬ 
dence of our growing understanding of the sensitivities of humans to their social 
environment and capabilities for coordinated action. Even more recently, researchers 
have begun to explore the influences of the organizational climate and culture on the 
performance of aircrew (Ciavarelli et al. 2001). 

Of particular relevance to aviation psychology is the notion of differences among 
individuals. Although Sinaiko and Buckley (1957) list it as a separate characteristic, 
it is really inherent in all the other characteristics they list. Humans vary, often con¬ 
siderably, on every characteristic by which they may be measured. The measurement 
of these individual differences and determination of how they contribute to other 
characteristics of interest—such as success in training, likelihood of an accident, 
skill at making instrument landings, or probability of being a good team member— 
are at the heart of aviation psychology. 

In addition to examining the errors associated with controls, Fitts and Jones 
(1947b, 1961b) also examined errors in reading and interpreting aircraft instruments. 
As in their study of control errors, they classified errors in reading or interpreting 
instruments into nine major categories. Errors in reading multirevolution instru¬ 
ment indications accounted for the largest proportion of errors (18%). Misreading 
the altimeter by 1,000 feet was the most common of these errors, accounting for 
13% of the total errors. Additional errors included reversal errors (17%), signal inter¬ 
pretation errors (14%), legibility errors (14%), substitution errors (13%), and using 
inoperative instruments (9%). 

Among their several conclusions, Fitts and Jones (1961b, p. 360) noted that “the 
nature of instrument-reading errors is such that it should be possible to eliminate most 
of the errors by proper design of instruments.” Arguably, 60 years after the first pub¬ 
lication of their work, current researchers could still arrive at the same conclusion. If 
most of the issues associated with the shape of controls have been resolved, problems 
with displays remain. However, they are not necessarily the same problems identi¬ 
fied by Fitts and Jones. Multirevolution instruments (most notably the altimeter) have 
been replaced with instruments that depict information differently—typically, along 
a vertical scale in the case of altitude. Yet, pilots still fly into the ground on occasion 
because, even after reading the instrument correctly, they have misprogrammed the 
system that controls the vertical flight profile of their aircraft. Likewise, radio navi¬ 
gation beacons required pilots to identify the Morse signal transmitted by the beacon 
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aurally and gave rise to signal interpretation errors; these have given way to GPS 
navigation, with its own set of display problems and corresponding errors. 

Each new generation of technology offers some solutions to the problems that 
existed in the older generation, while creating a whole new set of problems. This 
situation is succinctly described by Dekker (2002, p. 8), who notes that “aerospace 
has seen the introduction of more technology as illusory antidote to the plague of 
human error. Instead of reducing human error, technology changed it, aggravated the 
consequences and delayed opportunities for error detection and recovery.” 

3.4 PRINCIPLES OF DISPLAY DESIGN 

One way to break this chain of technology and error is to step outside the specific 
technologies and look at overarching principles that should be applied to all new 
technology development. Thus, instead of looking for the best shape for the landing 
gear control, we might look for the general principles by which such controls should 
be designed. As an example, let us consider the design of aircraft displays. Wickens 
(2003), one of the preeminent researchers in the area of aviation displays, has enu¬ 
merated seven critical principles of display design, which are described next. 

3.4.1 Principle of Information Need 

How much information does a pilot need? The short answer is “just enough.” Too 
little information (e.g., the absence of weather radar on days when thunderstorms 
are present) leaves the pilot flying, and making decisions, blind. Most pilots would 
agree that having more information is good, but the converse is also true. Having too 
much information can be as damaging as having too little. Too much information can 
lead to a cluttered flight deck (typified by the L-1011 and DC-10 era aircraft) with 
hundreds of dials and indicators. Searching for the needed information among all 
the extraneous information can lead to poor performance on critical, time-sensitive 
tasks. Current-generation aircraft, in contrast, have combined many of the formerly 
separate information sources into combined displays that integrate information, such 
as engine health, into a single, easily interpretable instrument. When that informa¬ 
tion is needed, it is easily obtained. 

To determine how much information is enough, we turn to a family of techniques 
subsumed under the title “task analysis”* (cf. Kirwan and Ainsworth 1992; Meister 
1985; Seamster, Redding, and Kaempf 1997; Shepherd 2001; Annett and Stanton 
2000). Although several varieties of this technique exist and sometimes are used for 
different purposes (e.g., for training development or for personnel selection, as men¬ 
tioned in other chapters of this book), they share a general approach to the orderly 
specification of the tasks that a person must accomplish, the actions (both physical 
and cognitive) that the person must complete, and the information required to permit 

This brief discussion cannot hope to do justice to a topic that is the subject of many volumes. The 
reader is encouraged to consult the general references to task analysis listed here for more information. 
However, even these are only a tiny sampling of the vast amount of information available. Because task 
analysis methods vary according to the intended use of the information, the survey of methods given 
in Annett and Stanton (2000) may prove most beneficial for the task analysis novice. 



56 


Aviation Psychology and Human Factors 


the person to complete the actions. For example, we might specify the information 
required to complete a precision instrument approach or the information required to 
identify which of several engines has failed. If the pilot is expected to complete these 
tasks (making the instrument approach and dealing with the failed engine), then he 
or she must have the required information. In addition, that information should not 
be hidden by or among other bits of information. 

3.4.2 Principle of Legibility 

In order to be useful, information presented on displays must be readable. Further, it 
must be readable under the conditions that exist in the aircraft flight deck. This means 
that the digits on the display must be large enough to be read by the pilot from his or 
her normal seated position. In some cases, they should also be readable by the other 
crew member—for example, if there is only one such display on the flight deck and 
both crew members must use it. Typically, designers solve this problem by locating 
common displays and controls midway between the two pilots on a central console. 

Legibility also requires consideration of effects such as glare and vibration. For 
example, almost all pilots quickly learn to steady their hands when reaching to tune 
the radio because even mild turbulence can make such a task quite difficult to perform 
rapidly and accurately. This vibration also impacts the legibility of displays, and the 
usual solution is to incorporate a larger font size so that the information can be read 
under the full range of operating conditions. The effects of factors such as these on 
human performance have been extensively investigated and are summarized by Boff, 
Kaufman, and Thomas (1988). In addition, the Engineering Data Compendium is an 
extensive online resource from the Fluman Systems Integration Information Analysis 
Center of the U.S. Air Force* Readers may also find Sanders and McCormick (1993) 
and Wickens and Hollands (2000) useful references on this topic. 

3.4.3 Principle of Display Integration/Proximity Compatibility Principle 

The novice pilot, particularly the novice instrument pilot, has no doubt that scan¬ 
ning the instruments to obtain the information required to control the aircraft and 
navigate requires effort. The level of effort required can be increased or decreased 
by the degree to which the instruments are physically separated. Thus, in all modern 
aircraft the primary flight instruments are located directly in front of the pilot. This 
reduces the time required for the pilot to move his or her scan from one instrument 
to another. It also means that the instruments can be scanned without moving the 
head—thus reducing the potential for vestibular disorientation. 

In addition, effort can be reduced if displays that contain information that must be 
integrated or compared are close together. This is seen most clearly in multiengine 
aircraft where two, three, or four sets of engine instruments are arrayed (in older air¬ 
craft) in columns, with each column corresponding to one engine and each row to one 
engine parameter (e.g., oil temperature or turbine speed). Given this arrangement. 


http://www.hsiiac.org/products/compendium.html 
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the pilot may quickly scan across all the engine temperature readings, for example, 
to identify an engine with an anomalous reading. 

Further examples are evident in the navigation instruments. For example, the lights 
indicating passage of marker beacons during a precision instrument approach are typi¬ 
cally located close to the primary flight control instruments and the instrument landing 
system (ILS) display. Having the lights in the direct field of view of the pilot, instead of 
somewhere in the radio stack, enhances the likelihood that they will be seen by the pilot. 
This is particularly important for these displays because they usually are extinguished 
after passage, with no persistent indicator that an important event has transpired. 

Integration of related information into a single instrument represents a means to 
further reduce pilot workload by eliminating the necessity to scan multiple instru¬ 
ments visually and, potentially, eliminating the necessity to combine separate bits of 
information cognitively. Perhaps the best example of this integration is the display 
for the flight management system (FMS) on a modern transport category aircraft. 
This system brings together in one display (typically called the primary flight dis¬ 
play) virtually all the information required for control of the aircraft and for horizon¬ 
tal and vertical navigation. 

At a somewhat simpler level, the flight director built into the attitude displays 
of some general aviation aircraft illustrates the same principles of integration. The 
flight director provides visual cues on the attitude indicator. In the simplest form, 
these cues can take the form of simple horizontal and vertical lines—depicting the 
localizer and glideslope for an ILS, for example. Essentially, this arrangement moves 
these indicators from the ILS display to the attitude indicator, thus eliminating the 
need for the pilot to move his or her scan between these two instruments. Another 
configuration makes use of a black inverted “V” that represents the visual cue from 
the flight director, as shown in Figure 3.1. 

In this figure, the triangle represents the aircraft on the attitude indicator. In this 
situation, the flight director cues indicate that the pilot needs to bank the aircraft 
to the left. If the pilot keeps the triangle tucked up into the black “V,” he or she 
will satisfy the cues from the flight director and will follow the desired course (i.e., 
ILS localizer/glideslope, VOR [very high frequency omnidirectional radio] radial, or 
GPS [global positioning system] course). 

3.4.4 Principle of Pictorial Realism 

This principle holds that the display should resemble or be a very similar picto¬ 
rial representation of the information it represents. The moving tape that represents 
altitude by moving a tape vertically represents one application of this principle. The 
current generation of moving map displays that can show the location of the aircraft 
(often depicted with a small aircraft symbol) against a background of topographic 
imagery is arguably an even stronger example of this principle. 

3.4.5 Principle of the Moving Part 

According to this principle, the element that moves on a display should cor¬ 
respond to the element that moves in a pilot’s mental model of the aircraft. In 



58 


Aviation Psychology and Human Factors 



FIGURE 3.1 Typical general aviation flight director display. 

addition, the direction of movement of the display element should correspond with 
the direction of movement in the mental representation. Arguably, this principle 
is best illustrated by the instrument that most thoroughly violates the principle: 
the attitude indicator. In the attitude indicator, the horizon is depicted as a mov¬ 
ing element, whereas the aircraft is shown as static. However, this is completely 
opposite to the pilot’s mental model, in which the horizon is static and the aircraft 
banks, climbs, and descends. The sacrifice in human performance demanded by 
this arrangement is reflected in the finding that, for novice pilots, the moving 
aircraft display is more effective than the moving horizon display. Furthermore, 
even for pilots who are experienced in flying with the traditional, moving horizon 
display, the moving aircraft display is no less effective (Cohen et al. 2001; Previc 
and Ercoline 1999). 

3.4.6 Principle of Predictive Aiding 

Predicting the future state of the aircraft (heading, altitude, rate of climb or descent, 
bearing to some beacon, etc.) is a complex and cognitively demanding task. Insofar 
as possible, displays should assist the pilot in this task by showing what will hap¬ 
pen in the future. This allows the pilot to take steps now so that the desired state is 
achieved or an undesirable state is avoided. Many of the current generation of FMSs 
provide this service by showing predicted flight paths, based on current engine and 
control settings. 

However, valuable assistance may be obtained from far less sophisticated systems. 
Consider the example of the fuel gauge, which we will discuss in more detail later 
in this chapter. Most current designs simply represent the current status of the fuel 
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supply—somewhere between full and empty. However, a slightly more sophisticated 
gauge could show future states, such as when and/or where zero fuel remaining will 
be reached, based on the current fuel load and consumption rate. This simple predic¬ 
tive aiding might help prevent the 10% of all accidents due to fuel mismanagement. 

3.4.7 Principle of Discriminability: Status versus Command 

Preventing confusion among similar displays is a responsibility of designers. 
Unfortunately, engineering demands often lead to sacrifices in this area. One exam¬ 
ple is the use of identically sized displays for all the engine instruments because 
commonality reduces cost. This arrangement saves money during manufacturing 
because only one size of hole need be punched in the panel; however, this can later 
lead to a pilot mistaking one instrument for another, with results that can range from 
humorous to disastrous. 

Unambiguous information is essential for the safe operation of the aircraft. 
Particularly problematic are those instances in which similar information, with an 
entirely different meaning, is displayed in a common display. This is a condition that 
is not unknown in FMSs and has been cited as the cause of at least one major crash 
(Air France Airbus A-320 that crashed in Mulhouse-Habsheim Airport, France). 

In addition to Wickens, many other researchers have also evaluated display 
issues. For example, looking specifically at the symbols used in displays, Yeh and 
Chandra (2004) posed four questions to be addressed when evaluating the usability 
of a symbol: 

• Is the symbol easy to find? 

• Is the symbol distinctive from other symbols? 

• Is the on-screen symbol size appropriate? 

• Can all encoded attributes of the symbol be decoded quickly and accurately? 

Much of the work up through the mid-1980s is summarized in Boff et al. (1988) and 
is readily available through the Engineering Data Compendium,* an online source 
of data related to human performance and design issues. 

The principles espoused by Wickens and others in the design of aviation sys¬ 
tems, along with the results of many empirical studies on the effects of system 
characteristics on human performance, are codified in government regulations 
pertaining to the design of aircraft control and display systems. Particularly 
detailed listings of design standards are also provided in the military standards 
and handbooks used to govern the design and development of new military air¬ 
craft and related systems. Indeed, much of the development of the knowledge 
relating to human capabilities and the corresponding standards for system design 
has been led by the military. One recent example of the military’s efforts to 
improve the design process is the U.S. Army’s MANPR1NT program (Booher 
1990, 2003). 


http://www.hsiiac.org/products/compendium.html 
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3.5 SYSTEM DESIGN 

MANPRINT stands for “manpower, personnel, and training integration”; however, 
MANPRINT also subsumes human factors, systems safety, and health hazards con¬ 
cerns. This program, which is codified into an army regulation, expresses the U.S. 
Army’s philosophy of a soldier-centric design process, in which the design of new 
military systems (e.g., new helicopters) starts from a consideration of who will oper¬ 
ate (pilot) and maintain (aircraft mechanics) the new system. 

3.5.1 Manpower 

In the MANPRINT program, manpower expresses the number of people who will 
be required to operate and maintain the system. Clearly, this is a major concern for 
the military because a system that requires additional people, all things being equal, 
will cost more to operate than a system that requires fewer people. The same concern 
arises in nonmilitary settings. For example, the design of two-person flight decks in 
the current generation of transport aircraft represents a considerable savings to air¬ 
lines over the previous flight decks that required three (or more) operators. Consider, 
however, the impact of such a design decision. 

Changing from a three-person to a two-person crew requires much more than 
simply moving all the deleted person’s controls and displays to one of the other crew 
members. It requires, first, a careful consideration of the tasks that the third person 
performed, the tasks that the other persons performed, and the potential for having 
the system perform some or all of these tasks. If these tasks were essential for the 
safe and efficient operation of the aircraft in a three-person crew configuration, then 
they will still need to be performed in the two-person configuration. However, now 
they must be performed by one of the remaining crew members or by some level of 
automation. For the most part, the current generation of aircraft has relied upon the 
third option, automation, to compensate for the missing crew member. This approach 
is not without its own peculiar drawbacks, however, because adding automation adds 
new tasks that are often quite different, and possibly more difficult, than the tasks 
that were eliminated. This problem will be illustrated in more detail later. For now, 
let us state one general rule: All designs begin from an understanding of the tasks to 
be performed. With that summary, let us return to our discussion of MANPRINT. 


Task analysis: A task analysis is a process of documenting the steps a user is 
required to perform (actions and/or cognitive processes) to complete a task. 


3.5.2 Personnel 

Knowing the tasks to be performed allows the second element of MANPRINT to 
be assessed. “Personnel” refers to the qualifications and characteristics of the people 
who will operate and maintain the system. At its most rudimentary level, it refers to 
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the general mental capacity or intelligence of the operators and maintainers. Usually, 
complex tasks require more mental capacity than simple tasks. However, the tasks 
required to be performed by the operator and maintainer may also demand physi¬ 
cal strength (e.g., the mechanic may need to lift heavy objects above his or her head 
while making certain repairs), color vision (air traffic controllers and pilots must 
be able to distinguish the colors of the color gun used by the tower during radio 
failures), or spatial reasoning ability (required of pilots and air traffic controllers to 
understand the potential for traffic conflicts), to give a few examples. 

Taken together, these personnel characteristics required for task performance 
amount to a specification for the human operators and maintainers, in much the same 
sense as the specifications given for the materials used to construct the airframe 
or power plant. In the same sense, using substandard materials that do not meet 
the specifications can lead to poor performance or failure. Ensuring that the human 
components meet these specifications involves personnel selection and training, both 
of which are addressed in other chapters of this book. 

3.5.3 Training 

Training, as indicated earlier, is the third major component of the MANPRINT 
program. Regardless of one’s attitudes about the military or military service, it is 
undeniable that the military services are consummate practitioners of the art and 
science of training. Although the numbers of trainees and the quality of the training 
certainly vary from nation to nation, among the Western democracies with which 
the authors are familiar, the training programs are most impressive. Consider, for 
example, the challenge of taking a person who has never flown an aircraft (perhaps 
has never even been in an aircraft) and in the space of about a year, transforming 
that person into a fully qualified military pilot, capable of operating a sophisticated 
aircraft under conditions considerably more demanding that those faced by civilian 
counterparts. This is a most impressive accomplishment, and it is made possible by 
a highly structured approach to the incremental accumulation of knowledge by the 
aspiring pilot. (See Chapter 5 on training for a detailed description of the systems 
approach to training used by the services.) 

Of course, this training program must be appropriate for the system that the 
trainee will eventually operate. A training program that only taught, for example, 
navigation by the use of nondirectional radio beacons or even VOR, without mention 
of GPSs or distance measuring equipment (DME), would likely produce graduates 
who were unsuited for the tasks that they must perform in an operational setting. 
Another general rule that we might state at this point is that training content is 
driven by the equipment that will be used and the tasks that will be performed by 
the graduates. 

Once again we see that tasks are a central concern. In addition, however, train¬ 
ing content and duration are also driven by the personnel entering training. The 
military services select high-quality personnel from off the street (these are often 
referred to as “ab initio,” meaning “from the beginning”) with no prior experience 
to train as pilots and mechanics. This choice of personnel means that the training 
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schools must teach the new trainees absolutely everything they need to know to 
operate or maintain an aircraft. Nothing can be assumed to be known when they 
report for training. Contrast this with the process used by most commercial air¬ 
lines, which select their pilots from a pool of individuals who already possess a 
pilot license and who may have many thousands of hours of experience. Certainly, 
these trainees can be assumed to know the basics or common piloting tasks, and the 
airlines’ training programs are designed to build upon that foundation, providing 
aircraft-specific training leading to a type rating or training on company policies 
and procedures. 

The interaction of manpower, personnel, and training considerations should now 
be clear. Each of these factors may be traded against the others in the design of 
a new system. For example, reducing crew size may require finding more capable 
crew members who can handle the additional tasks. Using lower quality personnel 
may mean that additional training is required to achieve the same standard of per¬ 
formance. Skimping on the human factors aspects of the crew interface may mean 
that additional crew members are required to perform the complex tasks that a better 
design would have eliminated. 


Design, guard, warn, train. This is the chain of activities, given in order of 
precedence, for building human considerations into the system development 
process. Early changes to the design of a system are more effective and less 
costly than later attempts to guard against operator error, to warn operators of 
hazards, or to train operators to use the system despite its inherent problems. 
Despite its popularity, training should be viewed as the solution of last resort. 


3.6 AN EXAMPLE: DESIGN OF THE FUEL GAUGE 

To give a general idea of how information gained from aviation psychology influences 
system design, our discussion thus far has been fairly abstract. Now, we will take a 
look at a concrete example, the ubiquitous fuel gauge. The fuel gauge, at least in light 
aircraft, is possibly the simplest gauge in the cockpit, so just what can be said about it 
from the standpoint of aviation psychology? As it turns out, there is quite a lot. 

Why worry about the fuel gauge? There are a lot of fuel-related accidents and 
incidents, in the United States, approximately 10% of accidents involving general 
aviation aircraft are attributed to fuel management, including fuel starvation (fuel 
not being fed to the engine) or fuel exhaustion (no fuel left) (Aircraft Owners and 
Pilots Association 2006). Also, about 20% of general aviation pilots report that 
they have been so low on fuel that they were worried about making it to an airport 
(Hunter 1995). One reason for these statistics may be the lowly fuel gauge—unseen, 
unheeded, misunderstood. 

Where should it be placed? There is a limited amount of space on the panel of an 
aircraft, so how does one decide what to put where? Previous studies have resulted 
in the classic “T” arrangement of the primary flight instruments. Beyond that 
arrangement, which is specified in the aircraft certification requirements by most 
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civil aviation authorities, where should all the other dials, knobs, lights, and indica¬ 
tors be placed? 

How big should the gauge be? Because panel availability usually dictates the 
maximum size of the instrument, this comes down to the question of how large the 
indicator pointer and text should be. How should it be labeled? Many fuel gauges 
in light aircraft look much like their automotive counterparts, with markings for 
FULL, %, 1/2,14, and EMPTY. At first glance, these labels seem entirely satisfactory, 
but consider the mental effort required by the pilot to extract usable information 
from such a display. Let us pose the question, “Does the pilot care that the tank is 
half full?” We propose that the answer is no. In fact, what the pilot cares about is how 
much longer the aircraft can remain airborne without the engine quitting (that is, 
remaining hours and minutes of available fuel) or how much farther it can fly without 
becoming a glider (how many miles the available fuel will take us). The answer to 
either of these two questions will tell the pilot if he or she can safely continue the 
flight or whether to consider a diversion. 

Certainly the answers can be derived from the current markings, but to do so the 
pilot must perform some intervening mental effort. First, he or she must convert the 
instrument reading (let us say “Yi”) into gallons. Thus, the pilot looks at the gauge, 
sees that it reads Vi, and, remembering the total fuel capacity from the pilots oper¬ 
ating handbook (POH), calculates that there are now 10 gallons of fuel remaining 
(half of the 20 gallons of total usable fuel). Next, the pilot must compute the flight 
time that 10 gallons will afford. He or she must recall the fuel consumption rate of 
the engine at this particular pressure altitude and power setting (or find the POH and 
look up the data) and then compute the hours of remaining fuel by doing a little men¬ 
tal division. All of this must be done, of course, without substantial error and while 
simultaneously flying the aircraft, navigating, and communicating. 

What is the alternative? From the foregoing discussion, certain aspects of an 
alternative design for the fuel gauge should be evident. First, the fuel gauge needs to 
be located at a place on the instrument panel in which it will be noticed by the pilot. 
Preferably, this should be very close to the normal instrument scan of the primary 
“T” instruments. Second, an alerting mechanism should be incorporated to draw the 
pilot’s attention to certain prespecified conditions (e.g., reaching a specific level of 
remaining fuel). Third, the fonts and symbols used on the display should be of suf¬ 
ficient size and illumination so as to be clearly visible under all operating conditions, 
without interpretation error. Finally, the scaling of the gauge should be changed so 
that more relevant information that does not require extra mental effort to process 
is presented. 

If a human-centered design is accomplished, then even a simple instrument such 
as a fuel gauge can be substantially improved so as to reduce operator error and 
improve performance. The key is to consider the display or control from the point of 
view of the operator and the underlying utility or purpose that the display or control 
serves. That is, what is the real need that is met by the control or display? In the case 
of the fuel gauge, the real purpose is to inform the pilot of how much longer powered 
flight can be maintained. It is not to tell the pilot how many gallons of fuel remain 
unconsumed. That information, while easy for the aircraft designer and manufac¬ 
turer to implement, represents but the first step in the process that provides the pilot 
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with the information that is really needed. It is incumbent upon the pilot to insist that 
designers and manufacturers not take the easy path, but rather create systems and 
hardware optimal for the operator to use—not for the manufacturer to build. 

3.7 INTERACTING WITH THE SYSTEM 

The importance of crafting the interaction between the system and the human opera¬ 
tor goes beyond the fairly simple issues of display markings. It also extends to the 
nature, sequencing, and amount of information provided to the pilot. Short-term 
memory is the term applied to human memory for information presented and retained 
for a fairly short time span—typically, on the order of a few seconds to a very few 
minutes. A common illustration from aviation would be the recall and read-back 
of a new frequency assigned by air traffic control. Typically, the following is the 
sequence of events: 

• ATC sends a voice radio message: “Aircraft 123, contact Center on 137.25.” 

• The pilot of the aircraft responds by saying, “Contact Center on 137.25, 
Aircraft 123.” 

• The pilot of the aircraft must then remember (hold in short-term memory) 
the values “137.25” while he or she reaches down and turns the radio fre¬ 
quency selector knobs to the appropriate setting. 

Between the time that ATC says “137.25” and the time the pilot completes the 
action of switching the radio, a period of approximately 5-10 seconds may elapse. 
During that time, the pilot must keep the value “137.25” in his or her short-term 
memory. Usually, this is done without error, although on occasion pilots will make 
mistakes and enter the wrong frequency. One reason this happens only rarely is that 
both the span of information to be recalled and the length of time are short relative 
to the capacity of humans. In this example, the span is five digits. 

Previous research has demonstrated that the short-term memory capacity of 
humans is around seven digits. The best known study of this phenomenon (Miller 
1956) refers to this as the “magic number 7, plus or minus 2.” As the number of dig¬ 
its to be recalled exceeds this “magic number,” the rate of errors increases rapidly. 
For this reason, well-designed systems avoid requiring humans to hold more than 
seven digits (or other bits of information, such as words) in their short-term memory. 
Telephone numbers, for example, seldom exceed seven digits and, in addition, take 
advantage of chunking to improve recall. “Chunking” refers to grouping of the digits 
so as to make them more memorable. For example, instead of listing a number as 
1234567, the number is given as 123-4567, or 1 23 45 67. Both of the latter arrange¬ 
ments are much less susceptible to recall errors. 

Two other psychological phenomena that influence the recall of information are 
serial position effect and confirmation bias. First, let us consider serial position 
effect. When humans learn a list of words or other information, it has been found that 
they tend to recall the first and last items best. That is, they will recall the first word 
in the list and the last word in the list better than those that appeared in the middle. 
Consider how this might be important to a pilot when receiving a weather briefing. 
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The research has shown that he is much more likely to recall the first and last things 
present than those in the middle. Might the weather service take this into account 
by placing the most important information (perhaps the information most critical to 
the safety of the flight) at the start and end of the briefing? This would maximize the 
likelihood of the pilot’s recalling this critical information. 

But, it is not just the position of information that affects the pilot’s recall. The 
predisposition of the pilot to receive information also comes into play. Psychological 
research has shown that humans tend to look for information that confirms or sup¬ 
ports their pre-existing beliefs or views of the world. This tendency is called con¬ 
firmation bias. In the context of the weather briefing example just given, consider 
how this confirmation bias might influence the pilot in recall or acceptance of infor¬ 
mation. If the pilot has already decided that the weather is adequate for the flight, 
then any information supporting that preconception will draw his or her notice and 
be recalled; however, any information not supporting that preconception will be 
ignored and forgotten. 

Taken together, short-term memory limitations, serial position effect, and confir¬ 
mation bias can be seen to have a significant impact on how information should be 
presented to pilots: 

• It is clear that pilots should not be expected to hold large amounts of infor¬ 
mation in their short-term memory. Information presented early in a brief¬ 
ing may have been forgotten or displaced by information presented later. 
Thus, asking pilots to draw conclusions and make judgments based on com¬ 
parisons or combinations of data presented over the course of an extended 
weather briefing is unrealistic. At a minimum, these data must be chunked, 
or combined in meaningful ways, so as to reduce the memory burden on 
the pilot. 

• Information particularly germane to the safety of a flight should be placed at 
the beginning or end (preferably, both) of the briefing to maximize recall. 

• Both the weather briefer and pilots need to be aware of the tendency to attend 
selectively to information that confirms pre-existing concepts. Because the 
weather briefer usually adheres to a standard format, there is less likelihood 
that he or she will selectively brief the pilot based on the briefer’s biases, but 
there has been no research that demonstrates that effect. However, the effect 
of confirmation bias on the receiver of information is well established, and 
only adherence to a disciplined approach to flight planning will allow the 
pilot to overcome this tendency. 

3.8 CURRENT ISSUES 

The issues currently facing aviation psychology and human factors are an outgrowth of 
the adoption of the glass cockpit design for air transport aircraft over two decades ago, 
as well as the gradual infiltration of computers and computer-based technology into 
the flight decks and air traffic control systems during that period. The introduction of 
new flight control and management systems has not been without its own set of diffi¬ 
culties. Although the FMS takes over many of the tasks previously performed by crew 
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members, it introduces new tasks or its own. These tasks, which might be referred to 
globally as “managing the management system,” involve more planning and problem 
solving in place of the old psychomotor tasks now performed by the FMS. 

The design of the human interface to these systems has not been entirely satisfac¬ 
tory, and issues over mode confusion still arise. The design of a system in which the 
human is kept constantly aware of the present state and future state of the system 
has not yet been entirely achieved. This problem is likely to grow even more acute 
as FMS-like systems are installed in growing numbers of general aviation aircraft, 
where they may be operated by pilots with very limited experience and training. A 
completely intuitive design for these systems will become a necessity because the 
training required to operate the current systems will not be feasible. 

This lack of an intuitive design is evident in the current generation of GPS naviga¬ 
tion systems for light aircraft. Although GPS allows for highly accurate three-dimen¬ 
sional navigation at almost any point on the surface of the Earth, its implementation 
has been subject to a great deal of criticism. Almost without exception, the displays 
used for GPS in light aircraft are small and the controls crowded together. Further, 
the functions of the controls are mode dependent, and the execution of tasks that in 
the traditional VOR navigation system required only a few tasks now requires exten¬ 
sive scrolling through menus and multiple function selections. For example, a VOR 
approach requires approximately five discrete steps, while the corresponding GPS 
approach requires over a dozen. 

Human factors issues associated with GPS displays and controls have been the 
subject of extensive research in recent years. Leading these efforts have been the 
research arms of the aviation regulatory agencies, particularly in the United States, 
Canada, and New Zealand. Researchers at the Civil Aerospace Medical Institute of 
the Federal Aviation Administration have conducted several studies to identify prob¬ 
lems in the usability of GPS receivers. These studies have ranged from evaluations 
of individual receivers to more wide-ranging global assessments and have identified 
many shortcomings of the GPS receiver interface. 

Wreggit and Marsh (1998) examined one specific unit that was believed to be 
typical of a class of devices available at the time. They had nine general aviation 
pilots perform 37 GPS-related tasks requiring waypoint setting, GPS navigation, 
and GPS data entry and retrieval. Their results indicated that a number of the menu 
structures used by the device interfered with the pilots’ successful entry of data, edit¬ 
ing of stored data, and activation of functions. Based on their findings, Wriggit and 
Marsh provided recommendations for the redesign of the interface structure. Some 
of the specific recommendations included consistent assignment of a given function 
to one button, provision of consistent and meaningful feedback, and provision for an 
“undo” or “back” function that would reduce the number of button presses. 

Williams (1999a, 1999b) conducted an extensive review of user interface prob¬ 
lems with GPS receivers, using data collected from interviews with subject mat¬ 
ter experts in the Federal Aviation Administration and from an inspection of the 
observation logs from an operational test of a GPS wide area augmentation system. 
Although he notes several interface issues associated with the displays and controls, 
Williams (1999a, p. 1) notes that “probably the most significant feature of GPS units, 
as far as the potential for user errors is concerned, is the sheer complexity involved 
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in their operation.” He points out that one measure of this complexity is the size 
of the instruction manual that accompanies each unit. The operation of the radio 
and display for a traditional VOR navigation system could be explained in, at most, 
10 pages; however, manuals for GPS receivers typically contain 100-300 pages. 
Although there does not seem to be any published research on the subject, one can 
wonder just how many pilots have actually read all the instructions that accompany 
their GPS receiver. One might also speculate on how much of that material is actu¬ 
ally retained* 

In addition to the overriding issue of complexity, Williams (1999a, 1999b) identi¬ 
fies a large number of specific human factor issues that detract from GPS receiver 
usability—many of which are hauntingly reminiscent of the problems identified by 
Fitts and Jones a half-century earlier. Some examples of the issues identified by 
Williams include; 

Button placement. Inadvertent activation of the GPS buttons, just like inadver¬ 
tent activation of landing gear and flaps, is made more likely by the poorly 
considered placement of the buttons. In the example provided by Williams, 
a manufacturer has elected to place the “clear” button between the “direct- 
to” and the “enter” buttons. This is an unfortunate arrangement because 
activation of the “direct-to” button is normally followed by the “enter” but¬ 
ton. Placement of the “clear” button between these two buttons makes it 
much more likely that the pilot will activate the “clear” button, when the 
intention was to activate the “enter” button. Recovery from such an error 
may entail considerable reprogramming of the GPS, perhaps at a time when 
the pilot is experiencing high workload from other activities, such as exe¬ 
cuting a missed approach. 

Knob issues. Many GPS receivers use a rotary knob to select and enter infor¬ 
mation. Often these knobs are used to select the alphanumeric characters 
of airports, VORs, and other navigation waypoints. Some of the knobs do 
not allow users to backtrack, so if they overshoot the character they wanted, 
they must continue turning until they go through the entire list again. This 
can significantly increase the head-down time required to program the 
receiver—a problem that is particularly acute while in flight. Furthermore, 
these knobs may function in more than one physical position: either pulled 
out or pushed in; the two positions provide entirely different functionality. 
Because there is no signal, other than a faint tactile sensation, to indicate 
which mode the knob is in, pilots can only determine what the knob is 
going to do by turning it and observing what happens. Clearly, this is an 
arrangement ripe with potential for serious errors, particularly when the 
pilot’s attention is directed elsewhere. 

In addition to printed manuals, manufacturers also offer online tutorials on the principles of GPS 
navigation. How much this helps with the actual use of their systems during flight is debatable. Trimble 
Navigation: http://www.trimble.com/gps/index.shtml+Garmin Navigation: http://www8.garmin.com/ 
aboutGPS/ 
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Button labels. Williams (1999a, p. 4) notes that "buttons that perform the same 
type of task on different units can have different labels.” The lack of unifor¬ 
mity, coupled with the complexity mentioned earlier, makes it difficult for a 
pilot familiar with one GPS system to use a different system. 

Automatic versus manual waypoint sequencing. Readers of pilot reports cap¬ 
tured by the Aviation Safety Reporting System (ASRS)* soon come to rec¬ 
ognize the familiar question posed by pilots of aircraft with modern FMSs. 
That question is typically “What is it doing?” Alternatively, the question 
may be stated as “Why is it doing that?” The "it” in both questions is the 
FMS and/or autopilot, and the questions are raised because the pilot is 
suffering from what is commonly termed mode confusion. The aircraft is 
behaving in a way that is not consistent with the pilot’s mental model of 
what it should be doing. This discrepancy arises because the complexity of 
the FMS allows for it to operate in multiple modes. If the pilot thinks that 
the FMS is in one mode, but it is actually in another, then truly unexpected 
things can happen, occasionally resulting in accidents. One such example is 
the Air France Airbus A-320 that crashed in Mulhouse-Habsheim Airport, 
France, following a low altitude fly-by (Degani, Shafto, and Kirlk 1996). 

Regrettably, pilots of general aviation aircraft, who are often envious of the equip¬ 
ment and capabilities of transport category aircraft, now have the dubious honor 
of sharing the problem of mode confusion with their airline transport brethren. 
Williams (1999a, p. 4) reports that "one of the most often cited problems...involved 
either placing the receiver in a mode where it automatically sequences from one 
waypoint to the next during the approach or in a nonsequencing mode.” Winter and 
Jackson (1996) reported that pilots frequently forgot to take the GPS receiver out of 
the “hold” function after completing the procedure turn. Because of that error, they 
were unable to proceed to the next approach fix. 

Many of these deficiencies were also noted by Adams, Hwoschinsky, and Adams 
(2001) in their review of adverse events attributed to GPS usage. In addition to the 
usability issues identified by Williams (1999a, 1999b), such as button placement and 
display size, Adams et al. also highlighted problems that arose from pilot overreli¬ 
ance on GPS, programming errors, and lack of knowledge on the use of the GPS 
receivers. As a further illustration of the relative complexity of the GPS receivers, 
Adams and colleagues noted that although 5 steps were necessary to perform an 
approach using the traditional VOR system, 13 steps were required in an equivalent 
GPS approach. 

Earlier, we noted that the traditional approach to the development of highly usable 
systems was based on the sequence: design, guard, warn, and train. Although GPS 
navigation arguably represents a remarkable step forward in the technology of air 
(and surface) transportation, GPS receivers (at least those marketed for general avia¬ 
tion aircraft) represent an equally remarkable failure to adhere to this philosophy. A 
failure to design usable receivers that prevent or mitigate errors leads to the necessity 
to warn users about their shortcomings and to attempt to remediate the problems by 


http://asrs.arc.nasa.gov/main_nf.htm 
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training users so that they do not fall victim to the interface idiosyncrasies. It is sad 
to note that, 60 years after Fitts and Jones showed us how errors could be prevented 
by a focus on the user and simple design changes, many designers (and regulators) 
still have not taken their lessons to heart. It is a situation that is cogently described by 
Dekker (2001) in a report that he aptly titled, “Disinheriting Fitts & Jones ‘47.” 

3.9 SUMMARY 

The central message that the reader should take away from this chapter is that 
systems—mechanical systems, social systems, training systems, display systems— 
must be designed so that they conform to the characteristics of their users and 
the tasks that they must perform. The design of systems is an engineering process 
marked by a series of trade-off decisions. The designer may trade weight for speed, 
increased power for increased reliability, or the size and legibility of displays for 
the presentation of additional information. The list is almost endless. In each of 
these design decisions, the engineer is striving to meet some design criteria, with¬ 
out being able to meet all criteria simultaneously equally well. In our everyday 
world, we often wish to satisfy competing criteria. For example, we might wish 
to have a very large house and simultaneously wish to have a very small monthly 
house payment. Unless a rich uncle dies and leaves us a pot of money, we are forced 
to compromise with a house that is big enough and a mortgage payment that is not 
too big. 

Usually, engineers produce a workable design that does not sacrifice the elements 
critical to successful operation of the system for the sake of competing criteria. 
Sometimes, however, they produce controls with the same knobs (it saves production 
costs to have all the knobs identical), leading to confusion during moments of high 
workload. They may also produce instrument panels in which essential, if rarely 
used, information is hidden physically, on a dial that cannot be seen without a great 
deal of effort, or logically, as part of a multifunction display system in which the 
needed information lurks beneath two or three levels of menus. However, the reasons 
that systems are poorly designed from the standpoint of the human user do not serve 
as excuses for those designs. 

The reader should now be aware of some of the features and considerations that 
go into the production of usable aviation systems. We hope that readers will use that 
knowledge at the least to be informed consumers of those systems and, even more, to 
become active advocates for improved aviation systems. 
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4.1 INTRODUCTION 

Highly skilled people are essential for the airlines to operate efficiently, safely, and 
with satisfied customers. In a military context, the organization will also have other 
objectives, but skilled workers are still as important. For the individual employee, 
it is important to have a job that is sufficiently challenging, where the individual 
is appreciated and rewarded in relation to how well he or she performs the job. To 
achieve this, it is important to have both a good selection system and an effective 
training program for candidates who have been selected. A successful selection pro¬ 
cess will lead to lower dropout rates during training and an increase in the number of 
students completing the program. In addition, a well-designed selection system will, 
in the long term, contribute to a more effective and resilient organization; however, 
this claim may be harder to document compared to lowered dropout rates. Ideally, 
the selection of personnel should be based on methods that we know work for this 
purpose. This means that empirical evidence shows that the methods can tell us 
something about how the person will be able to perform the job or training he or she 
seeks. 

Most of the research on selection methods in aviation has addressed the selection 
of pilots and, in recent years, air traffic controllers (ATCOs) also. The selection of 
military pilots is often based on young people without any previous flying experi¬ 
ence; however, selection for civilian airlines includes experienced pilots as well as 
people without any flying experience (ab initio). Most airlines probably prefer to hire 
experienced pilots and thus avoid a long and expensive period of training. 

In most cases, the selection of pilots and air traffic controllers is a comprehensive 
step-by-step process, at least for ab initio selection. That is, it usually starts with 
a large number of applicants who are tested with a range of psychological tests. 
In addition, applicants must meet a number of formal requirements in the form of 
medical requirements, no police record, and, sometimes, earlier education (e.g., 
completed high school or college). These formal requirements, however, may vary 
from organization to organization. After initial testing, the best candidates proceed 
to further testing and an interview and often more extensive medical examinations. 
For both ATCO and pilot selection, usually less than 10% of the applicant population 
will be accepted into the training program. 

When choosing methods for the selection process, it is important to start with a 
thorough review of the job in order to determine what skills, abilities, and qualities 
are important for the person to possess. Such a systematic review is called a job 
analysis, and it involves a detailed survey of the tasks involved in the job. There are 
several ways to do this—for example, by observing workers or interviewing them. In 
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the literature, various techniques are described that can be used to obtain information 
about the work content and the capabilities or skills needed to perform the tasks. 


4.2 JOB ANALYSIS 

A job analysis consists mainly of two elements: a job description and a person speci¬ 
fication. A job description is an account of the activities or tasks to be performed; 
a person specification lists the skills, expertise, knowledge, and other physical and 
personal qualities the person must have to perform these activities. Several meth¬ 
ods can be utilized to carry out a job analysis, and a job analysis can have multiple 
purposes in addition to forming the basis for a selection process. Therefore, some of 
the job analysis methods emphasize outlining the tasks to be solved or the behaviors 
that need to be performed, while others focus on the qualities the person should 
have. A job analysis is therefore both the job requirements (output) and personnel 
requirements (input). If the purpose of a job analysis is to determine what should be 
measured as part of the selection process, more emphasis should be put on the person 
specification. Figure 4.1 provides an overview of these two perspectives. 

One of the most well-known job analysis methods is called the critical incident 
technique, which was developed by Flanagan during World War II for mapping the 
job performance of fighter pilots (Flanagan 1954). The main purpose of this method 
is to identify behavioral examples of good and poor performance. For many tasks, 
most people would have managed to perform them, so they are of little interest; 
others will be more demanding and not everyone will be able to perform the tasks 
successfully (e.g., critical incidents). The method involves using experienced workers 
as informants, and they are asked to describe critical tasks and what characterizes 
both good and bad ways to solve the tasks. It is also important to examine the extent 
to which these experts agree on what the critical tasks are and the characteristics 
of good and poor work performance. The work proceeds with some sorting of task 
descriptions, where the purpose is to create categories. Many tasks may require good 
communication, and many tasks may require the ability to perform calculations. 
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FIGURE 4.1 Two perspectives on job analysis. 
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Finally, a list is constructed of the abilities and skills the person must have to per¬ 
form these tasks. 

Another method, the repertory grid technique, was developed by George Kelly 
(1955). Here the experts are asked to imagine good, moderate, and poor workers. Then 
they are to describe how these workers are similar and different in relation to work 
performance. A third technique is Fleishman’s job analysis survey method (1975). 
Employees are asked to indicate, on a seven-point Likert scale, the extent to which 
different abilities, personality traits, and skills are relevant to job performance. The 
person needs to consider seven main areas: cognitive abilities, psychomotor skills, 
physical demands, sensory capabilities, knowledge and skills, cooperation, and social 
skills. Each area has a number of subcategories. For example, cognitive ability includes 
a total of 21 categories, such as spatial orientation, time sharing, and attention. 

4.2.1 Job Analysis for Pilots and Air Traffic Controllers 

Fleishman’s job-analysis method was used in a study of civilian pilots (Goeters, 
Maschke, and EiBfeldt 2004). Many of the cognitive abilities on the list were 
described as relevant or highly relevant, as were psychomotor and sensory abilities. 
Within the cooperative/social skills domain, coping with stress, communication, and 
decision making were identified as very important. 

For military pilots, a revised version of the Fleishman method was used in a NATO 
study (Carretta, Rodgers, and Hansen 1996) in which pilots from several countries were 
asked to assess 12 critical tasks specific to the job of fighter pilot. They were then asked 
to specify the abilities and skills that were important when conducting these tasks. The 
most important abilities were situational awareness, memory, motivation, and reasoning. 
Least important were reading comprehension and writing, in addition to leadership. 

Fleishman’s method was also used in a study of German air traffic controllers 
(EiBfeldt and Heintz 2002). In addition to the original scales from Fleishman, a few 
more scales were added that included personal characteristics such as cooperation, 
communication, and the ability to handle stress. The cognitive abilities that received 
the highest ranking were speed of closure, visualization, and selective attention, 
in addition to time sharing. Few of the cognitive abilities received a low score. 
Visualization involves the ability to imagine objects and movements in space; selec¬ 
tive attention means that an individual is able to concentrate on a task without being 
distracted. Time sharing involves the ability to shift attention quickly between dif¬ 
ferent tasks. In addition, several psychomotor skills were rated as important, together 
with sensory abilities and specific knowledge (e.g., map reading). Several of the 
social skills were also highly rated, including stress resistance, decision making, and 
cooperation. There were some differences in the skills/qualities that were important 
between air traffic controllers in different functions (area control, approach, aero¬ 
drome control); however, for the most part, these differences were small. 

4.2.2 A Critical Perspective on Job Analysis 

An important methodological question is to what extent we can rely on the results 
from a job analysis. One possibility is that experienced workers who are asked to 
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assess the capabilities needed to perform the job may overestimate the number of 
skills and qualifications that are necessary. This may be more or less a conscious act, 
but it is natural that people want to present themselves in a favorable light, including 
overestimating the complexity of the job they are doing and the abilities and skills 
needed to perform the job. 

In modern job analysis, more emphasis is placed on uncovering the competence 
needed and less emphasis is put on the specific tasks to be solved (see, for example, 
Bartram 2005). This is partly a function of the modern labor market, where many 
jobs are constantly changing and thus more global assessments and less focus on spe¬ 
cific tasks and abilities may be more useful. However, it may be difficult to achieve 
a reliable assessment of the competence needed because competence is a complex 
concept often seen as a mixture of skills, knowledge, motivation, and interests. A 
meta-analysis of reliability coefficients from job analyses concluded that reviews of 
specific tasks had higher interrater reliability than the more general descriptions of 
the competence needed (Dierdorff and Wilson 2003). 

The results from a job analysis may be used to select specific tests to be applied in 
the selection process and also to select appropriate criteria of work performance that 
could be used in a validation study. Many people would therefore argue that a job 
analysis is an important and necessary first step in a selection process. Meta-analyses 
have demonstrated, however, that ability tests predict job performance more or less 
independently of the occupation (Schmidt and Hunter 1998). One consequence of 
this may be that a very detailed job analysis may not be needed. On the other hand, 
job analyses of pilots and air traffic controllers have demonstrated that a number of 
highly specialized cognitive skills are important, and a test of general intelligence 
may not provide an adequate measure of such abilities. 

4.3 PREDICTORS AND CRITERIA 

The methods used to select applicants are identified as predictors, while measures 
of work performance are labeled criteria. When a psychological test is used to select 
an air traffic controller, the test is a predictor. To assess how well the test is suitable 
for this purpose, we have to conduct a validation study—that is, a study in which test 
results for applicants are compared to actual work performance or academic results. 
Both work performance and academic grades are examples of criteria. 

4.3.1 Predictors in Selection 

Predictors should ideally be selected because they measure something relevant for 
future work performance, possibly identified through a job analysis. A number of 
different methods can be used in a selection process, and here only the most common 
will be described. The interview as a selection method is applied to most professions, 
and it may be more or less structured. An interview is highly structured if the ques¬ 
tions are formulated in advance and the ordering of the questions is also predeter¬ 
mined. Sometimes the interview is conducted toward the end of the selection process 
after the less time-consuming methods have been used. Employers in the process of 
hiring people probably also feel the need to meet the person face to face through an 
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interview. Many employers believe that they have a unique ability to uncover who is 
more suited to the job and will fit nicely into the organization. 

Unfortunately, this is assumption is often wrong. Unstructured interviews often 
have a very poor predictive validity, and the assumption that this method always 
identifies the right person is frequently wrong. More structured interviews, however, 
have a much higher predictive validity than those where random questions are asked. 
For an interview to be effective, it is important to think through and formulate job¬ 
relevant questions to be used with all the applicants. It is also important to train the 
interviewers, especially if more that one person is conducting the interviews. One 
advantage of the interview is that it also gives the applicant an opportunity to meet 
representatives of the organization and ask questions about the job. 

Another type of predictor is the assessment center, which could be used for more 
purposes than just selection—for example, in leadership training and promotion. 
The assessment center method involves the candidate receiving various tasks that are 
similar or relevant to the job sought. Often, this involves situations in which small 
groups of people try to resolve a problem together. This makes it possible to study 
how people interact with each other, their leadership abilities, communication skills, 
and so on. Several trained observers, who usually make use of standardized forms to 
rate the performance, observe the applicants. The method is time consuming, and it 
often takes from half to a whole day or more. 

Yet another type of predictor is called the work sample test, which represents a 
less comprehensive approach than an assessment center. This means that the candi¬ 
date performs a similar task or the same task as the person would do as part of the 
job. The idea is that the behavior will predict similar behavior at a later date. There 
are often standardized scoring rules for how the performance should be rated. 

A number of psychological tests also may be used as predictors, including ability 
tests and personality tests. Some of these are designed for selection, while others are 
designed for other purposes—for example, clinical use or diagnostics. Tests that are 
intended for special groups or designed for completely different purposes may not 
necessarily be suitable for personnel selection. Psychological tests are frequently 
used for pilot and air traffic controller selection, but less commonly for other groups 
in the aviation industry. Many of the tests used in the selection of pilots and air traffic 
controllers have been developed specifically for these occupational groups. 

Past work experience, school grades, and biographical data are also sometimes 
used for selection purposes. If a person already has some work experience, it would 
be reasonable to obtain references from former employers. School grades may also 
be used in the selection process—in particular, to select people to further educa¬ 
tion. Biographical data gathered from employment records may also be used to dis¬ 
criminate between successful and unsuccessful employees. This information may be 
used for future selection of candidates. For example, if the best insurance sellers are 
married and own their own homes, then applicants who have these characteristics 
should be preferred. The items and their weighting are based on a purely empirical 
approach. Some people may argue that the method is unfair because applicants are 
selected based on factors over which they have little or no control—instead of meas¬ 
uring the relevant abilities and skills directly. 



78 


Aviation Psychology and Human Factors 



Work sample tests 
General mental ability tests 
Structured interviews 
Assessment centers 
Biographical data 
Conscientiousness tests 
Years of education 
Graphology age / 


FIGURE 4.2 The validity of different selection methods. (Based on Schmidt, F. L., and 
Hunter, J. E. 1998. Psychological Bulletin 124:262-274.) 

The final and perhaps most exotic method that will be mentioned here is graph¬ 
ology or handwriting analysis. This method involves an analysis of a person’s hand¬ 
writing in order to determine personality characteristics. Several studies have shown 
that this method is not suitable for selection, even though it is currently used in 
several European countries for personnel selection decisions (for an overview, see 
Cook 1998). 

The various methods mentioned here have different predictive validity, and they 
are also different in relation to how costly and time consuming they are. Meta-analysis 
methods have been used to investigate the various methods’ predictive validity, and 
the best predictors are work-sample tests, intelligence tests, and structured inter¬ 
views, with an average predictive validity of about .50. Age and graphology had no 
predictive validity (Schmidt and Hunter 1998). An overview of the validity of differ¬ 
ent methods is presented in Figure 4.2. 

4.3.2 Criteria of Job Performance 

Valid criteria for work performance are as important as good predictors when evalu¬ 
ating the selection process. The selection of criteria can be based on a previous job 
analysis where key tasks and what constitutes good performance have been identi¬ 
fied. The easiest way to examine the predictive validity of a method is to use an 
overall criterion. For many jobs, one can argue that this is not adequate and that it 
would be more reasonable to apply multiple criteria to describe job performance and 
the variety of tasks performed. When conducting a validation study, the researcher 
needs to choose which criteria to use or, more conveniently, combined criteria instead 
of trying to predict a large number of criteria at the same time, which could be dif¬ 
ferentially related to the predictors. 

In many cases, instructors or superiors are used to assess performance, and it 
is important that this happens in a systematic and reliable way. One way to assess 
reliability would be to let two instructors evaluate the same people and then study 
the degree of agreement between them. There are a number of known errors in such 
person evaluations; for example, if a person is good at one thing, it is automatically 
assumed that he or she performs other tasks equally well. It may also be difficult to 
get the observers to use the entire scale; that is, all performance may be assessed to 
be average, or there may be little variation between different items rated for each 
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person. It is important that there is variation between individuals and between vari¬ 
ous tasks that the same person performs whenever the rating will be used as a cri¬ 
terion. If all the candidates perform the task equally well, it is not suitable as a 
criterion of work performance. 

In order to achieve good interrater reliability it is important to train the observ¬ 
ers and to specify what constitutes good and poor work performance. In addition, 
the criterion needs to have good construct validity, which means that it measures 
the construct in which one is interested—for example, leadership or communication 
skills. There will frequently be practical limitations to which criteria can be assessed 
as part of a validation study, and it is also important that these criteria are seen as 
relevant to the organization. 

In relation to pilot selection, most of the criteria are usually obtained during train¬ 
ing; in many cases, pass/fail in training is used. Pass/fail is a criterion about which it 
is easy to collect information that the organization regards as important. One prob¬ 
lem with this criterion is that it is a somewhat indirect measure of performance. In 
some cases, reasons other than poor performance may cause the candidate to fail 
training, including everything from airsickness to lack of motivation. Nevertheless, 
in most cases, the criterion of pass/fail seems to work reasonably well. In a study 
of students at a Norwegian military flight school, the pass/fail criterion was highly 
correlated with assessments made by flight instructors (Martinussen and Torjussen 
2004). In this study, pass/fail was taken as a valid measure of pilot performance. 

Another problem with applying the criterion pass/fail is that it is based on per¬ 
formance during training and not actual work performance. However, few studies 
employ more long-term criteria of pilot performance, and there may be many reasons 
for this choice. A more long-term criterion would require that the validation study 
takes longer time to conduct. It may also be difficult to find comparable criteria for 
different jobs—for example, pilots working in different airlines. In addition, it is 
obviously more difficult to evaluate workers in a real-life setting than during train¬ 
ing, where they expect to be evaluated. 

In addition to pass/fail and instructors’ ratings, assessments of graduates’ perfor¬ 
mance in a simulator have also been used in validation studies. For air traffic control¬ 
lers, the situation is similar, and validation studies have largely been conducted using 
criteria obtained during training or in a simulator. 

4.4 HOW CAN WE KNOW THAT PREDICTORS WORK? 

In order to document that predictors are useful in the selection of candidates, 
one can perform a local validation study or evaluate meta-analysis results that 
summarize relevant validation studies. Local validation studies are so named 
because they are conducted in the native setting (company, applicant, test, and 
training) in which the selection system under evaluation would eventually be 
employed. Local validation studies are usually performed by correlating test 
results with a measure of job performance—for example, performance in a sim¬ 
ulator or assessments made by an instructor or supervisor. Sometimes, several 
tests are used in combination or tests are combined with an interview. In such 
cases, one can apply a combined test score or use regression analysis to find a 




80 


Aviation Psychology and Human Factors 


weighted combination of predictors that gives the highest correlation with the 
criterion. In many cases, it may be difficult to conduct local validation studies 
because the organization does not employ a sufficient number of people within 
a certain time period. 

4.4.1 Meta-Analysis 

An alternative to conducting a local validation study is to combine previous studies 
in a meta-analysis. In order to merge results from multiple studies, the studies must 
all supply a common metric or measure of effect. Fortunately, most of the articles 
reporting results from validation studies include correlation coefficients, which are 
highly suitable for meta-analysis. Some studies, however, only report the results 
from multiple regression analyses, and these cannot be combined with correlations 
from other studies. The meta-analysis calculation requires that a standardized index 
(e.g., the Pearson correlation coefficient or another measure of effect size) be used 
and that the results be reported for each predictor separately. In a regression analysis, 
the results indicate how well the combined set of tests predicts a criterion; regres¬ 
sion coefficients will depend not only on the correlation between the test and the 
criterion, but also on the intercorrelations between other predictors included in the 
equation. Because the individual contributions of the predictor measures cannot be 
separated, the regression coefficients cannot be used in meta-analyses. 

There are several meta-analysis traditions, and the most widely used method 
within the work and organizational psychology was developed by John Hunter and 
Frank Schmidt in the late 1970s. Their method was initially designed to study how 
well test validity could be generalized across different settings. The method is there¬ 
fore well suited to perform a meta-analysis of validation studies because it takes into 
consideration many of the methodological issues relevant in such studies. Hunter and 
Schmidt (2003) have described a number of factors or circumstances that may affect 
the size of the observed correlation or validity coefficient. These factors, or statisti¬ 
cal artifacts, will influence the size of the correlation coefficients in various degrees 
from study to study. 

Three such statistical sources of errors are lack of reliability, restriction of range, 
and use of a dichotomous criterion (e.g., pass/fail) instead of a continuous measure. 
The lower the score reliability is, the lower the observed correlation will be. It is 
possible to correct for this artifact if the test score reliability is reported in the article 
(Hunter and Schmidt 2003): 


_ T obs 

r cor 


In this equation, r COI is the corrected correlation, r, )bs is the observed correlation, and 
r xx and r yy are reliability of the predictor and the criterion, respectively. The cor¬ 
rected correlation is an estimate of the correlation that would have been observed 
if the variables had been measured with perfect reliability. In some cases, it is 
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appropriate to correct for lack of reliability in only one of the variables (e.g., correc¬ 
tion for criterion reliability in validation studies) because the purpose is to evaluate 
the usefulness of the tests with all the errors and shortcomings that they may have. 
The correction is then 


_ T obs 

y cor 


If we assume that the observed correlation between the ability test and a criterion 
is .40 and that criterion reliability is .70, then the corrected correlation is .40/ \[jQ 
= .48. This number represents the correlation that would have been observed if the 
criterion had been perfectly measured. The lower the score reliability is, the greater 
will be the correction factor. 

The second factor that affects the size of the correlation is reduced test score 
variation (restriction of range) on one or both variables as a result of selection. This 
occurs if only the relationship between test scores and subsequent performance of 
those who have been selected on the basis of the test results is studied. If only the 
best half of the applicant group has been selected, then it will be possible to collect 
criterion data only for this group. The calculated correlation will be much lower 
for this group than if we had studied the entire unselected group. The effect on the 
observed correlation can be dramatic, given that a very small group is selected. 

This problem is illustrated in Figure 4.3, where the shaded part represents the 
selected group included in the study. Based on the plot in this figure, we can see that 
if we calculate the correlation for the selected group, then the correlation would have 
been much lower than if the calculation had been based on the entire applicant group. 
If we assume an even stricter selection and move the line (X,) to the right, then the 
shaded field will be almost like a round ball, implying zero correlation. 

Y 




FIGURE 4.3 Illustration of restriction of range. 
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There are few empirical examples of the phenomenon “restriction of range.” One 
of the few examples dates back to World War II, when applicants to the U.S. Air 
Force were tested and selected. Because of the lack of pilots at that time, all the 
applicants were admitted into the basic flying program. The predictive validity could 
then be calculated for the entire group as well as for a selected group. The predic¬ 
tive validity for the total test score (pilot stanine) was .64. If the normal procedure 
had been applied and only the top 13% of the candidates had been selected, then 
the predictive validity would have dropped to .18 (Thorndike 1949). This provides 
a picture of the dramatic effect that calculating the predictive validity on a highly 
selected group may have. In other words, the problem is that we have excluded the 
control group by only using the selected applicants in the study. The correction for 
range restriction is based on information about the test score standard deviation in 
the whole group, or the proportion of applicants selected (Hunter and Schmidt 2003). 
In situations where the selection is based on several tests or tests used in combination 
with other types of information, the situation becomes more complicated and more 
advanced models for range restriction correction should be applied (Lawley 1943; 
Johnson and Ree 1994). 

The third statistical artifact is the application of a dichotomous criterion (e.g., pass/ 
fail in training) when the performance (flying skills) really can be said to be a contin¬ 
uous variable. This artifact also leads to a lower correlation between test and criterion 
than if we had measured performance on a continuous scale. This statistical artifact 
can be corrected if the distribution between the pass/fail ratios is known. The farther 
the distance is from a 50/50 distribution of pass/fail, the greater is the correction. 

For both pilot and air traffic controller selection, all these statistical artifacts are 
frequently present and contribute to a lower observed correlation between the test 
and criterion. When possible, the observed correlations should therefore be corrected 
for these error sources before they are included in a meta-analysis, as well as to 
provide a better estimate of the true predictive validity. Unfortunately, such correc¬ 
tions are often difficult because the primary studies frequently lack the information 
needed to perform such corrections. Criterion reliability is rarely examined, and 
information about the selection ratio is not reported in many articles. The percentage 
passing or failing training is normally reported; this makes it possible to correct for 
the effect of using a dichotomous criterion. In studies where such corrections are 
not implemented, the observed correlations must be viewed as very conservative 
estimates of the tests’ predictive validity. In addition, sampling error will contribute 
to variation between observed correlations in different studies. However, this error 
is unsystematic; therefore, the statistical methods described before cannot be used to 
correct for the error. 

4.4.2 When Can Test Validity Be Generalized? 

How do we know if the predictive validity of a test can be generalized over different 
settings? For example, can intelligence tests always be successfully used for selection, 
regardless of setting and occupation? Hunter and Schmidt (2003) proposed a rule of 
thumb that states that if at least 75% of the observed variance between the correla¬ 
tions can be attributed to statistical errors and sampling error, then it is reasonable to 
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assume that the remaining variance is due to error sources not corrected for. In such 
instances, it is safe to assume that the true variance between studies is very small or 
zero and that the mean correlation is an appropriate estimate of the true validity. 

The second situation arises when we have real variance in the population; it is 
then possible to estimate an interval (credibility interval) that, with a high probabil¬ 
ity, includes the predictive validity. This interval is calculated based on the corrected 
average correlation and the estimated population standard deviation (Whitener 
1990). If the interval is large and in addition contains zero, it means that the actual 
variation between studies is considerable and that the test in some cases does not 
have predictive validity. Other occasions may arise, however, in which the interval 
is of a certain size, but does not include zero. This means that there is some varia¬ 
tion in the predictive validity, but that it is always larger than zero. In that case, the 
predictive measures can be used as a valid predictor, even though its utility will vary 
from instance to instance. 

It is also possible to perform a significance test of the variation between studies, 
but this is a less common strategy in the Hunter and Schmidt meta-analysis method, 
where the estimation of variance is emphasized. 

4.5 HISTORICAL OVERVIEW 
4.5.1 Pilot Selection 

Probably few professions have been tested as much as pilots (see, for example, Hunter 
1989). During World War I, the first tests were developed and validated—not many 
years after the Wright brothers made their first flight (Dockeray and Isaacs 1921). 
Many of the first tests were simple constructions simulating tasks or situations with 
which humans involved in flying would have to cope. One of the earliest test batter¬ 
ies from the United States (Henmon 1919) contained tests that measured emotional 
stability, reaction time, general cognitive abilities, and sense of equilibrium. 

In Europe, similar tests were developed in several countries. In Denmark, Alfred 
Lehman (Termphlen 1986) developed methods for pilot selection in his laboratory. 
He suggested tests that measured emotional stability, evaluation of spatial relation¬ 
ships, attention, reaction time for sound, and sense of equilibrium. The test that 
measured emotional stability consisted of psychophysiological measurements at the 
same time as a test administrator fired a shot behind the back of the candidate. 
Lehman suggested that the test was unsuitable for selection because it was not pos¬ 
sible to distinguish those who were really cold blooded from those who reacted to 
stress induced in the test situation (Termphlen 1986). There were many similarities 
between the tests that were used in different countries in this first phase of test devel¬ 
opment. Paper-and-pencil tests were used together with apparatuses that simulated 
aspects of a flying machine, in addition to simple measures of reaction time and 
judging distance and time. 

After World War I had ended, there was little research on a pilot selection in 
most countries (Hilton and Dolgin 1991; Hunter 1989). An exception was Germany, 
where a large number of tests were developed; at the beginning of World War II, 
the country had a test battery that consisted of 29 tests that measured, among other 
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things, general intelligence, perceptual abilities, coordination, ability, character, and 
leadership (Fitts 1946). During the war, this test battery was replaced by a less exten¬ 
sive system with fewer tests and more emphasis on references and interview data 
(Fitts 1946). 

In England, the United States, and Canada, the trend was different. At the start of 
the war, few tests were in use; by the end of the war, a large number of tests had been 
developed and implemented. In Norway, the Norwegian Air Force used tests first 
in 1946 (Riis 1986). Since then, the Norwegian test battery has been expanded and 
validated several times (see, for example, Martinussen and Torjussen 1998, 2004; 
Torjussen and Hansen 1999). 

After World War II, research declined again, and many countries put emphasis on 
maintenance rather than on developing new tests. This more or less continued until 
the first computerized tests were invented in the 1970s and 1980s (Bartram 1995; 
Hunter and Burke 1987; Kantor and Carretta 1988). As computer technology became 
cheaper and better, the paper-and-pencil tests were replaced entirely or partially with 
computerized tests in most Western countries (Burke et al. 1995). 

Early in the history of aviation, personal qualities, as well as cognitive and psy¬ 
chomotor skills, were seen as important in order to become a good pilot. To examine 
which personality traits were important, both observation of pilots and participating 
observation were used. After having undergone flight training, Dockeray concluded 
that “quiet methodical men were among the best flyers, that is, the power and quick 
adjustment to a new situation and good judgment” (Dockeray and Isaacs 1921). It 
would still be many years before personality tests were developed and used for pilot 
selection. In the United States, a comprehensive program to find suitable personality 
measures for pilot selection was started in the 1950s. The research program was led by 
Saul Sells (1955, 1956), and a total of 26 personality measures were evaluated. Sells 
and his colleagues used more long-term criteria of pilot performance in their evalu¬ 
ation, and they concluded that personality tests were better predictors of long-term 
criteria compared to ability tests, where the predictive validity declined over time. 

A number of well-known personality inventories have also been examined 
in relation to pilot selection over the years. These include the MMPI (Minnesota 
Multiphasic Personality Inventory) (Melton 1954), Eysenck Personality Inventory 
(Bartram and Dale 1982; Jessup and Jessup 1971), Rorschach (Moser 1981), and 
Cattell 16PF (Bartram 1995). The results showed only low to moderate correlations 
with the criterion. One of the few studies on civilian pilots (conducted at Cathay 
Pacific Airlines) showed that, based on training results, successful pilots scored 
lower on anxiety compared to less successful pilots (Bartram and Baxter 1996). 

In Sweden, a projective test called the “defense mechanism test” (DMT) was 
developed by Ulf Kragh (1960). The purpose was to select applicants for high-risk 
occupations such as pilots and deep-sea divers. The test material consisted of pic¬ 
tures presented using a special slide projector that displayed images. Exposure of 
each picture was very short, but increased each time the image was presented. The 
person drew and explained what he or she saw, and the discrepancy between the 
actual image and what the person reported was then interpreted as various defense 
mechanisms. (This is a very simplified overview of a complicated and comprehen¬ 
sive scoring procedure; see, for example, Torjussen and Vternes 1991.) The test was 
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met with considerable optimism when it was launched, and it was tested on military 
pilots in several countries such as England, The Netherlands, and Australia, as well 
as in Scandinavia (Martinussen and Torjussen 1993). 

However, it has been difficult to document the predictive validity of the test for 
pilots outside the Scandinavian countries, and very few countries currently use the 
test. With the introduction of computers in testing, a number of personality-related 
concepts have been evaluated. This includes measures of risk taking, assertiveness, 
field dependency, and attitudes (see Hunter and Burke, 1995, for an overview). In 
most cases, this has only resulted in very small correlations with the criterion and 
no increase in the predictive validity beyond what the ability tests predicted (i.e., no 
incremental validity). Recent studies, however, have yielded more positive results for 
personality measures used for pilot selection (see, for example, Bartram and Baxter 
1996; Hormann and Maschke 1996). 

Personality traits are emphasized today in varying degrees during the selection 
process. Some countries, like the United States and England, do not use personality 
tests in their selection process (Carretta and Ree 2003). However, an assessment of 
personal qualities and motivation may be evaluated during the interview. 

In addition to ability, psychomotor and personality tests, biodata, past experience, 
and flying performance in the simulator have been used as predictors in varying 
degrees over the years. 

4.5.2 Selection of Air Traffic Controllers 

Selection of applicants for air traffic controller education occurs in most Western 
countries by using psychological tests, but research in this area is less extensive 
compared to that for pilot selection (Edgar 2002). The first psychological tests were 
put into use early in the 1960s and consisted of paper-and-pencil tests (Hatting 1991). 
Today, computerized tests are used in many countries, and the selection process is 
often as comprehensive as that for pilots. 

Most validation studies in relation to air traffic controllers have been conducted by 
the Federal Aviation Administration (FA A). The first test batteries adopted in the 1960s 
had paper-and-pencil tests measuring reasoning (verbal and numerical), perceptual 
speed, and spatial skills (Hatting 1991). Test results were combined with information 
about education, age, and experience in the selection process. In the 1970s, the FAA 
began developing a simulation-based test that would measure the candidates’ skills in 
applying different rules in a simulated airspace. The test was later adapted to a paper- 
and-pencil format and labeled “multiplex controller aptitude test.” It was used together 
with measures of reasoning ability and professional experience in the selection from 
the beginning of the 1980s. The development of a computerized test battery began in 
the 1990s, and it measured, among other things, spatial reasoning, short-term memory, 
sense of movement, pattern recognition, and attention (Broach and Manning 1997). 

In Europe, EUROCONTROL conducted (Hatting 1991) a review of the member 
states’ selection procedures in the late 1970s and discovered that most countries 
applied tests that measured spatial perception, verbal ability, reasoning, and mem¬ 
ory. Few countries used tests to map out the interest or motivation for the profession. 
An exception was Germany where, in addition to a comprehensive test battery, the 
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German Aerospace Center (Deutsches Zentrum fur Luft-und Raumfahrt) also used 
a measure of personality traits and a simulation-based test to measure cooperation 
(Eiftfeldt 1991, 1998). In addition, all countries had medical requirements and formal 
requirements concerning age and previous education. 

In the 1980s, computerized tests were developed in Germany and England, and 
in 2003 EUROCONTROL launched a common computer-based test battery (called 
FEAST) for the selection of air traffic controllers that the member countries could 
apply. As part of this project, the member countries would also have to supply data 
to a joint study of the predictive validity of the tests. 

Today, many countries have replaced paper-and-pencil tests entirely or partially 
by computerized tests. Basic cognitive abilities are measured; in addition tasks are 
assigned in which computer technology is used to simulate parts of the work as an 
air traffic controller. In Sweden, a considerable effort has been put into developing 
a situational interview, where the purpose is to map individual abilities and social 
attitudes (Brehmer 2003). Situation interviews are developed from critical incidents, 
and the main goal is to determine effective versus ineffective work performance 
by asking the applicants very specific questions. Some personality tests have also 
been investigated, but the results have generally been discouraging, with weak cor¬ 
relations between measures and the criterion. Studies based on the big-five model, 
however, have found more positive results (Schroeder, Broach, and Young 1993). 

4.6 HOW WELL DO THE DIFFERENT METHODS WORK? 

Fewer validation studies have been conducted for air traffic controllers compared 
to pilots. In 2000, a meta-analysis of available studies found a total of 25 articles 
and reports that documented validation results for ATC selection based on a total 
of 35 different samples (Martinussen, Jenssen, and Joner 2000). These studies were 
published between 1952 and 1999, and the majority were based on applicants and 
students (92%). Most of the studies were conducted in the United States (77%), and 
the criteria used were mostly collected during training (e.g., pass/fail, instructor 
evaluations, simulators). 

The results from these 25 articles were combined in a meta-analysis where the 
average predictive validity was calculated, and the population variance of the tests 
was estimated. The total samples ranged between 224 and 11,255 persons for the 
different categories of predictors. Virtually none of the studies reported information 
that made it possible to correct for lack of reliability in the criterion and restriction of 
range. The average correlations are therefore an underestimate of the true predictive 
validity. Correlations were corrected for the use of a dichotomous criterion (pass/ 
fail). The various tests and predictors were grouped into categories, and a summary 
of these results is presented in Figure 4.4. For all but two of the predictors (verbal 
skills and multitasking), there was some true variance between studies. 

When it comes to pilots, a large number of validation studies have been con¬ 
ducted since World War I. Several literature reviews (see, for example, Carretta 
and Ree 2003; Hunter 1989) and two meta-analyses have been published (Hunter 
and Burke 1994; Martinussen 1996). In spite of a slightly different database and 
some procedural differences in the way the meta-analyses were conducted, the two 
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Meta-Analysis Results for ATC Selection 

Mean Predictive Validity 



Mean r 

FIGURE 4.4 Meta-analysis results for air traffic controllers. 

meta-analyses resulted in very similar findings. An overview of the average correla¬ 
tions for the different test categories from Hunter and Burke (1994) is presented in 
Figure 4.5. A total of 68 studies published between 1940 and 1990 were included 
with a total sample of 437,258 participants. The average correlation was not cor¬ 
rected for any statistical artifacts because the primary studies did not include the 
necessary information to make this possible. For all test categories, there was some 
true variation between studies, and for some test categories the credibility interval 
included zero, implying that the predictive validity in some situations was zero. 
This applied to the categories general intelligence, verbal skills, fine motor abil¬ 
ity, age, education, and personality. This means that, for the other categories, the 
predictive validity was greater than zero even though there was some true variance 
between studies. 

4.7 PERSONALITY AND JOB PERFORMANCE 

Meta-analysis results (Martinussen 1996; Martinussen et al. 2000) for both pilots 
and air traffic controllers have shown that cognitive ability tests can be used effec¬ 
tively in the selection process, but the results are less impressive when it comes to 
personality measures. This finding may be due to several factors and does not imply 
that personality is not important for work performance. On the contrary, job analy¬ 
ses for both pilots and air traffic controllers have listed personality characteristics 
as important in order to do a good job. For air traffic controllers, cooperation, good 
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Meta-Analysis Results for Pilot Selection 

Mean Predictive Validity 
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FIGURE 4.5 Meta-analysis results for pilots. 

communication skills, and ability to cope with stress were emphasized (EiBfeldt and 
Heintz 2002). For military pilots, qualities such as achievement motivation and abil¬ 
ity to make decisions and act quickly, in addition to emotional stability, have been 
seen as particularly important (Carretta et al. 1996). In another study, American 
fighter pilots (N = 100) were asked to rate 60 personality traits in relation to various 
aspects of the job. The most important of these dimensions was conscientiousness 
(Siem and Murray 1994). 

How can we then explain the lack of predictive validity? One possibility is that 
the personality tests used have not been suitable for selection purposes. For example, 
clinical instruments originally designed to diagnose problems or pathology may not 
be appropriate for personnel selection. Another possibility is that many of these tests 
are self-reported; applicants choose the more socially desirable response and thus it 
is easy for applicants to present themselves in a favorable light. A meta-analysis of 
personality measures based on the big-five model showed that although the appli¬ 
cants to some degree presented themselves in a favorable light, this had little effect 
on the predictive validity of the measures (Ones, Viswesvaran, and Reiss 1996). 

Another factor is that the criteria used are often obtained during training, and 
it is reasonable that some cognitive abilities are more important in this educational 
setting compared to personality traits. This is in line with the findings of Sells (1955, 
1956) suggesting that personality tests were better predictors in the longer term. 
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However, few validation studies have used actual job performance providing empiri¬ 
cal evidence for this statement. 

In a study of 1,301 U.S. pilot students, the results indicated that the male candi¬ 
dates were more outgoing and scored lower on agreeableness compared to the norm. 
When examining the subscales of the big-five inventory, several differences between 
pilots compared to the normal population can be found: They are less vulnerable 
(neuroticism); they are active, outgoing, and seek new experiences (openness); and 
they are coping oriented and competent (conscientiousness). Female pilots showed 
many of the same characteristics compared to a normative sample of women; in 
addition, they scored higher on openness to new experience. In other words, they like 
to try out new things (Callister, King, and Retzlaff 1999). 

In another study of 112 pilots from the U.S. Air Force, female pilots were com¬ 
pared with male pilots and with a random sample of women. Female pilots scored 
higher than their male colleagues on the dimensions agreeableness, extroversion, 
and conscientiousness. They were also more emotionally stable and scored higher on 
openness (King, Retzlaff, and McGlohn 1998). 

A recent study of U.S. pilot students showed that, compared with normative data, 
these pilots could be described as people who set themselves high goals and engaged 
in constructive activities to achieve these goals. The goals often included new and 
unfamiliar experiences and also the quest for increased status, knowledge, and skills. 
They appeared often as calm, less inhibited, and more willing to tolerate risk com¬ 
pared to a normative sample (Lambirth et al. 2003). 

In studies that have compared air traffic controllers with other professions, it has 
been found that air traffic controller students scored lower than other students on 
anxiety (Nye and Collins 1993). In another study using the big-five taxonomy, they 
scored higher on openness to new experience and conscientiousness and lower on 
neuroticism that a norm group (Schroeder et al. 1993). 

There is. however, little evidence to support the notion of a fixed pilot personality 
or air traffic controller personality. Pilots and air traffic controllers vary on a number 
of personality characteristics in the same way that people in the general population 
vary. At the same time, there are some differences between the pilots and air traffic 
controllers as a group and the general population, probably as a function of the selec¬ 
tion process and self-selection before people enter these professions. 

4.8 COMPUTER-BASED TESTING 

The first computerized tests were developed in the 1970s and 1980s (Bartram 1995; 
Hunter and Burke 1987; Kantor and Carretta 1988). As computer technology has 
become both cheaper and more efficient, paper-and-pencil tests have entirely or par¬ 
tially been replaced with computerized tests in most Western countries (Burke et 
al. 1995). The introduction of computerized testing has led to simplifications in test 
administration and scoring. But both computers and software need to be updated, so 
this type of testing also requires maintenance and revisions. 

An advantage of the use of computerized testing is that it has made it possible to 
test more complex psychological abilities and skills than before. It is now possible to 
measure reaction time and attention, both alone and as part of a more complex task. 
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It is also possible to simulate parts of future work tasks and to present information 
both on the screen and through headphones. A problem associated with such com¬ 
plex dynamic testing is that the test may progress differently for different applicants. 
Options and priorities taken at an early stage could have consequences for both the 
workload and complexity of the task later. 

In addition, it may be that applicants use different skills and strategies for prob¬ 
lem solving. For example, someone may give priority to speed rather than safety and 
accuracy. This makes the scoring of such tests more complicated than with simpler 
tests, where the number of correct answers will usually be sufficient indices of per¬ 
formance. Another aspect of such dynamic tests is that they often require longer 
instruction and introduction periods before the testing can begin. This makes them 
time consuming. They are therefore often used at a later stage in the selection pro¬ 
cess, when the applicant group has already been tested with simpler tests and only the 
strongest candidates are permitted to enter the final phase of the selection process. 

Computers have made it possible to apply more adaptive testing. That is, the 
degree of difficulty of the tasks is determined by how the candidate performs the 
first tasks in the test. This type of test is based on item response theory (see, for 
example, Embretson and Reise 2000) in which the purpose is to estimate the person’s 
ability level. These tests are expensive in the developmental stage, and most tests 
used today are based on classical test theory. 

With the Internet, it is now possible to test applicants located anywhere in the 
world. This is financially beneficial to the organization because it saves travel 
expenses. One problem with this approach, however, is how to ensure that the appli¬ 
cant is answering the test questions rather than someone else. Nevertheless, it is 
likely that the use of the Internet to conduct such pretesting will be adopted by more 
organizations. This allows applicants to investigate whether this is something for 
them to pursue. In the next step in the selection process, the best applicants will be 
invited to participate in further testing and whether the correct person actually took 
the tests can be checked. Some organizations provide applicants with information 
about the tests, and they are also given the opportunity to practice some of the tests 
before the selection process begins. 

4.9 THE UTILITY OF SELECTION METHODS 

Utility refers to how much an organization earns by applying specific methods in the 
selection process rather than using a more random selection process (for an overview 
of the topic, see Hunter 2004). Several models may be used in order to estimate this, 
and one critical factor in these models is, of course, the value of a good employee 
relative to a less efficient employee. A rule of thumb that has proven to hold for many 
occupations is that the best employees produce about twice as much as the worst. In 
utility calculations, the expenses associated with the selection process and testing 
have to be included, but this amount is often much smaller compared to that for a 
poor performing employee or a candidate who fails to complete an expensive train¬ 
ing program. In addition, it is to be hoped that increased safety is also an outcome, 
but this is harder to document empirically because serious accidents are rare in avia¬ 
tion and those who could constitute the control group are normally not hired. 
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FIGURE 4.6 Right and wrong selection decisions. 

Calculations performed by the German Aerospace Center in 2000 showed that 
the selection of ab initio candidates cost €3,900 per candidate, while the training cost 
€120,000. If the candidate fails training, the expenditure is estimated to be €50,000 
(Goeters and Maschke 2002). The corresponding figure from the U.S. Air Force is 
between $50,000 and $80,000 for candidates who fail training (Hunter and Burke 
1995). In other words, the test costs are relatively low compared to the costs for those 
who do not complete pilot training. 

A simple model that may also be used to calculate the utility of the selection 
procedure is based on the Taylor and Russel (1939) tables. Their model assumes a 
dichotomous criterion, which is illustrated in Figure 4.6. According to the model, 
two correct decisions can be made: 

1. Select those who would perform the job successfully. 

2. Do not hire those who would not perform the job satisfactorily. 

Two erroneous decisions can also be made: 

1. Employ those who will not be successful (candidates who clearly pose the 
greatest problem for the organization). 

2. Reject those who would manage the job. 

The correct decisions are marked with (+) in the figure and the erroneous are 
marked with (-). The larger the correlation (in other words, the predictive validity) is, 
the more often correct decisions are made. To calculate the increase in correct deci¬ 
sions by using a given selection method, we need to know the predictive validity, the 
selection ratio, and how many would perform the job successfully in the applicant 
group (base rate). 
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For example, assume that the aviation authorities want to hire 20 people to per¬ 
form security checks at a smaller airport. Assume that there are 100 applicants and 
that approximately 50% would be able to perform the job. If we do not use any form 
of selection process, but rather pick randomly, approximately half of the candidates 
would do the job satisfactorily. Suppose that we also use an ability test and the pre¬ 
dictive validity is .40. To sum up, the selection rate is 20/100 = 0.20, base rate is 0.50, 
and the predictive validity is .40. What is the utility of using the test? By inspecting 
the table, we find that we increase the number that would succeed in the job from 50 
to 73%. If the predictive validity is lower—for example, .20—the increase would be 
from 50 to 61%. If the selection rate is lower, fewer people are selected (e.g., only the 
top 5% of the applicants), and the proportion would increase from 50 to 82% (with r 
= .40). In other words, the lower the selection rate and the higher the validity are, the 
more advantage is provided by using the test. 

The effect of changes in the base rate is somewhat more complicated, and the far¬ 
ther away from a base rate of 0.50, the smaller the increase in successful applicants 
is. Imagine a rather extreme situation (e.g., with no qualified applicants); it does not 
matter how high the predictive validity is because it will not improve the outcome; 
that is, no one would be able to do the job. Additional examples are presented in 
Table 4.1. 

More sophisticated models for calculating the utility of selection tests do not make 
the assumption that performance is twofold (success/failure), but rather that it can be 
assessed on a more continuous scale. The most difficult part in these calculations is to 
estimate the dollar value of different workers. One way to do this is to assume a dollar 
value for a very productive employee—for example, by estimating how much it would 
cost the company to hire someone to perform the same job. A good employee means 
one who is in the top layer (i.e., the 85th percentile compared to a poor worker, who 
is at the 15th or lower percentile). In addition, the predictive validity and the quality 
of the applicants must also be known. These estimates can then be used to calculate 
how much the organization will earn per year per person employed. Included in these 
calculations are the costs of selection (see, for example, Cook 1998). 


TABLE 4.1 

Utility of Selection Methods 


Selection Rate 


Base Rate 

r 

0.10 

0.30 

0.50 

10% 

.10 

.13 

.12 

.11 

10% 

.30 

.22 

.17 

.14 

10% 

.50 

.32 

.22 

.17 

50% 

.10 

.57 

.55 

.53 

50% 

.30 

.71 

.64 

.60 

50% 

.50 

.84 

.74 

.67 


Source: The examples are from Taylor, H. C., and Russel, J. T. 1939. Journal of Applied Psychology 
32:565-578. 
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4.10 FAIRNESS IN SELECTION 

Many countries have laws that prohibit discrimination on the basis of gender, race, and 
political or religious convictions when hiring. Some countries, such as the United States, 
also have laws against discrimination because of age or disability. Many countries have 
also adopted guidelines on the use of tests in employment where there are specified 
requirements for the test user and the methods. This is further discussed in Chapter 2. 

A topic that has been discussed for a long time is how the concept of fairness 
should be understood. Is a method fair if it has the same predictive validity for differ¬ 
ent applicant groups, or is it only fair if the method will result in an equal number of 
people hired as they are represented in the applicant group or perhaps in the popula¬ 
tion? If, for example, women constitute 30% of the applicant group and a company 
only hires 5% of women applicants, then the selection method has adverse impact. 

Opinions differ on which group should be used for comparison. Is it the total 
adult population in a country, or is it the group of qualified candidates that should be 
considered? The latter is probably the most reasonable choice when calculating the 
adverse impact. In the United States, the employer must ensure that the method used 
does not have a so-called adverse impact. The solution is then to be sure to hire a cer¬ 
tain proportion of the groups that would have been underrepresented. Alternatively, 
the employer must argue that the test or method is job related and has predictive 
validity. That other equally valid methods do not have such a negative impact must 
also be ruled out. The legal practice on this topic appears to be far stricter in the 
United States (where many lawsuits have been instrumental in shaping today’s prac¬ 
tice) than in Europe. 

4.10.1 Differential Predictive Validity 

Two hypotheses are concerned with the claim that a test is valid for one group only 
(often white men), but not for others (e.g., different ethnic groups). One is that the 
test predicts job performance for one of the groups, but not for the other. The sec¬ 
ond hypothesis is that the tests are valid for both groups, but have a higher validity 
for one of the groups (differential validity). It has been difficult to document this, 
however, probably because the primary studies that have examined the predictive 
validity for the two groups have often included very small samples of minorities. 
Meta-analyses have not found support for the hypothesis of differential validity for 
white versus nonwhite, beyond what one might expect as a result of chance (Hunter, 
Schmidt, and Hunter 1979). 

Similarly, in a meta-analysis where validity coefficients for women were com¬ 
pared with men, there was no overall support for differential test validity (Rothstein 
and McDaniel 1992). However, it turned out that for jobs that required little educa¬ 
tion and where there was a very high proportion of women or of men, some sup¬ 
port for differential predictive validity was discovered. A study of U.S. Air Force 
applicants found no gender difference in tests’ predictive validity in relation to flight 
performance (Carretta 1997). 

All in all, little evidence supports the idea of differential validity, but this does 
not imply that all groups necessarily will have the same mean test score on all tests. 
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Fair methods imply that the test predicts job performance equally accurately in the 
different groups, rather than that groups have the same average test score. 

4.11 APPLICANT REACTIONS 

Most of the research in personnel selection has studied the development and valida¬ 
tion of selection methods. It is easy to understand that an organization wants to focus 
on these areas in order to ensure that the methods with the highest predictive validity 
are used for selection. Another related perspective is to consider the selection pro¬ 
cess from the applicant's perspective. How does the person experience the selection 
methods and what kind of impression does he or she get from the organization on the 
basis of the selection process? The methods that applicants prefer are not necessarily 
the methods with the highest predictive validity. 

Nevertheless, we should be concerned about the applicant’s perspective for several 
reasons. First, applicants’ perceptions of the methods’ validity and fairness could 
influence their motivation to perform well on the tests. If the methods appear strange 
and unrelated to the job, applicants may be reluctant to do their best and perform 
well. This may also affect the chances that they will accept a future offer of employ¬ 
ment from the organization. In a situation where young people have many opportuni¬ 
ties for education and employment, it is important to attract the best applicants. 

Several studies have examined what applicants or volunteers (often students) 
think about different methods, such as ability tests, personality tests, or interview. 
What applicants prefer depends to some extent on the context (i.e., the job sought) 
and whether they feel they have some control over what is going on. The content of 
the test or questions in the interview are also critical, and applicants in general prefer 
job-relevant questions. 

Two studies of pilot applicants examined what the candidates thought about the 
selection process. One was based on the selection of pilots for Lufthansa (Maschke 
2004), and the second study consisted of selection of pilots for the Air Force in Norway 
(Lang-Ree and Martinussen 2006). In both cases, the applicants were satisfied with 
the selection process and the procedures were considered to be fair. The applicants 
were also asked to rate different tests and methods used, and the computerized tests 
received the highest ratings. In the survey, the relationship between applicants’ 
responses and test performances was also examined (Lang-Ree and Martinussen 
2006). The results showed that those who had the most positive attitudes towards the 
selection system performed slightly better on tests, perhaps not surprisingly. 

4.12 SUMMARY 

This chapter has dealt with important principles in personnel selection and how 
selection methods, including tests, should be evaluated. Development of a selection 
system should start with a job analysis where the abilities and personal character¬ 
istics needed to accomplish the job should be specified. Then, methods suited to 
precise measurement of those capabilities or characteristics should be chosen. It is 
important to have evidence that the methods have predictive validity, either from 
local validation studies or meta-analyses. Other important aspects of the selection 
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process are the applicants’ reactions and attitudes and that the methods used are 
fair. The utility of the selection can be calculated in dollars or in terms of correct 
decisions. Often, the costs associated with conducting a selection process are small 
relative to the costs of employees who are not performing well or who are unable to 
complete training. 
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5.1 INTRODUCTION 

In the earliest days of aviation there were no instructor pilots. The first aviators, such 
as Orville and Wilbur Wright, Octave Chanute, Otto Lilienthal, and the Norwegian 
Hans Fleischer Dons* (originally a submarine officer), trained themselves. They 
were simultaneously test pilots and student pilots, with the inevitable consequence 
that many died during the process of discovering how to maintain control of their 
aircraft (including Lilienthal, who died in a glider crash in 1896). Aspiring modern 
pilots are fortunate to be the beneficiaries of the experiences of these pioneers, along 
with several succeeding generations of pilots who have also made their contributions 
to the art and science of aviation training. However, not all advances in aviation 
training have come from pilots. Researchers (most of them not pilots) in the fields 
of psychology and education have also helped shape the format, if not the content, 
of current aviation training. Principles of how humans learn new skills developed in 
the laboratories have been applied advantageously to pilot training. In this chapter, 
we will examine some of those principles and how they are applied in an aviation 
setting, along with the general process of training development. 

Training is a broad term that covers a number of activities conducted in a variety 
of settings. Training can be categorized according to when it occurs—for example, 
initial training required to impart some new skill set as opposed to remedial train¬ 
ing required to maintain those skills. It can also be categorized according to where 
it occurs—for example, in the classroom, in the simulator, or in an aircraft. Training 
can also be categorized according to the content—for example, whether it addresses 
purely technical issues, such as the computation of weight and balance, as opposed 
to nontechnical issues, such as crew coordination. Regardless of how one chooses to 
categorize training, the goals of these activities are common: the development of a 
set of skills and knowledge in the trainee to some specified level of competency. 


Training: The systematic process of developing knowledge, skills, and atti¬ 
tudes; activities leading to skilled behavior. 


The processes and methods used to achieve the desired training goals are informed 
and shaped by the scientific research dealing with human learning. Beginning with 
the late nineteenth century work of Ebbinghaus, a great deal of research has been 


For more information on this and other early Norwegian aviators, consult the very interesting book, 
100 Years of Norwegian Aviation (National Norwegian Aviation Museum 2005). 
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directed at understanding the conditions that influence human learning. This is an 
immense body of work far beyond the scope of this chapter. A large number of 
introductory volumes on human learning are available for the student who desires 
more information. Recent examples include Hergenhahn and Olson (2005), Ormrod 
(2007), and Mazur (2006). In addition to these general works, there are works dedi¬ 
cated primarily to the issues of training in aviation. Some examples here include 
Henley (2004), O'Neil and Andrews (2000), and Telfer and Moore (1997). 

Later in this chapter, we will examine some of the training issues specific to 
aviation. However, it is important to note that training, like the design of the physi¬ 
cal apparatus described in an earlier chapter, should be thought of as a system. The 
design of effective training, particularly when such training will be used by multiple 
students and instructors over an extended period, requires careful consideration of a 
number of interrelated factors. Therefore, before we consider the specifics of aviation 
training, we should examine these factors and the techniques that have been devel¬ 
oped to ensure successful training system development. 

5.2 TRAINING SYSTEM DESIGN 

The rigor that is applied to the design of a training system reflects the planned 
application of the training system. That is, very little rigor is typically placed on 
design of training that will be used with only one or two people for a job of little 
significance. Most people have experienced training of this sort. For example, a 
new employee is shown, in a training program lasting about 1 minute, how to oper¬ 
ate the office copier. There is no formal training plan. There is no formal evaluation 
of student retention of the material. Furthermore, there is no assurance that this 
employee will receive the same instruction as the next employee. Standardization 
is lacking because standardization and the planning that it entails are expensive, 
and the result of failure to perform the task successfully is simply a few sheets of 
wasted paper. 

Contrast this with the consequences of failure to train a pilot properly to execute 
an instrument approach or to execute a rejected takeoff. Here the consequences are 
potentially severe, both in terms of money and in terms of human life. Clearly, the 
latter situation requires a more formal approach to the design of a training program 
to ensure that all the necessary elements are addressed and that the student achieves 
a satisfactory level of performance. The expense of a rigorously developed training 
program is justified by the consequence of failure and, for many organizations, such 
as the military services, by the large numbers of personnel to be trained over an 
extended period. 

One method for ensuring that rigor is applied is the use of the systems approach 
to training (SAT), also known as instructional system development (ISD).* This set 
of procedures to be used for the development of training systems originated with the 
American military in the mid-1970s. It is described in detail in a multiple volume 

* Much of this discussion of SAT/ISD is derived from Air Force Manual 36-2234 (U.S. Air Force 1993), 
to which the interested reader is directed for more detailed information. Meister (1985) is also a good 
source of general information. 
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military handbook (U.S. Department of Defense 2001) and in manuals produced 
by each of the individual services. For example, a manual produced by the U.S. 
Air Force (1993) adapts the general guidance given in the Department of Defense 
handbook to the specific needs of the Air Force. Similarly, individualized documents 
exist for the U.S. Navy (NAVEDTRA 130A; 1997) and the U.S. Army (TRADOC 
Pamphlet 350-70; n.d.). 

This structured approach to the development of training has also been embraced 
by civil aviation. For example, its principles are included in the Advanced 
Qualification Program (AQP) promoted by the Federal Aviation Administration, and 
in the Integrated Pilot Training program established by Transport Canada. These 
advances reflect the general approach of embedding the systems approach to train¬ 
ing that is reflected in the International Civil Aviation Organization Convention on 
International Civil Aviation—Personnel Licensing (ICAO n.d.) 

To put it simply, SAT/ISD is a process that provides a means to determine 

• who will be trained; 

• what training will involve; 

• when training will take place; 

• where training will take place; 

• why training is being undertaken; and 

• how training is accomplished. 

The first reaction of many pilots and nonpilots when seeing such a list is 

Why all the fuss? Why is an elaborate system needed for such simple questions? Surely, 

the answers to these questions are obvious: 

• Whoever wants to be a pilot will get training. 

• We will show trainees how to fly the airplane. 

• We will fly in the morning or afternoon. 

• The training will take place in the aircraft. 

• We are doing the training so that the student can fly the plane. 

• The instructor will show how it is done, and then trainees will do it themselves. 

The other reaction is “Why not just keep on doing what we have always done? After 
all, we have been training pilots for years.” 

These concerns are understandable and may even have merit in some situations; 
however, they reflect a generally limited view of the world. If one is considering only 
a single instructor and a few aspiring student pilots, then the training will take place 
more or less as outlined in the second set of bullets. It is a casual approach to a situ¬ 
ation that does not demand high efficiency of training or rigorous quality control of 
the product. 

Consider, however, the situation faced by almost every military service and by 
many air carriers. The military take large numbers of recruits with no previous mili¬ 
tary experience and, in most cases, who lack the technical skills and knowledge 
required to perform the duties of their military specialty. In a relatively short time, 
these recruits must receive military and technical training that will enable them to 
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function as part of a military unit, where the consequences of failure to perform are 
often very high. Clearly, their training cannot be left to chance. If for no other reason 
than to minimize the enormous costs associated with large-scale military training 
(e.g., the cost of producing one helicopter pilot in the U.S. Army is approximately 
one million U.S. dollars; Czarnecki 2004), the training programs must be designed 
so as to deliver exactly what is needed in a format and at a time that ensure the train¬ 
ees achieve a satisfactory level of competence. To achieve this goal requires careful 
analysis and planning, and these are precisely what SAT/ISD provides. 

The five stages of the SAT/ISD process are 

• analyze; 

• design; 

• develop; 

• implement; and 

• evaluate. 


Task: “A single unit of specific work behavior, with clear beginning and ending 
points, that is directly observable or otherwise measurable. A task is performed 
for its own sake, that is, it is not dependent upon other tasks, although it may 
fall in a sequence with other tasks in a mission, duty, or job” (U.S. Department 
of Defense, 2001, p. 47). 


5.2.1 Analyze 

The SAT/ISD process begins with an analysis of the job for which a person is to 
be trained. The objective is to develop a complete understanding of what it takes to 
perform the job. During this stage, a task inventory is compiled in which all the tasks 
associated with the job are listed. In addition, the standards, conditions, performance 
measures, and any other criteria that are needed to perform each of the tasks on the 
task inventory must also be identified. 

This job analysis is typically performed by observing personnel (job incumbents) 
on the job and making note of what they do or by interviewing incumbents about 
what they do. In the military services in particular, more formal occupational survey 
and analysis procedures may be used. One example of an instrument used to capture 
such information is the critical incident technique (CIT; Flanagan 1954). Once a list 
of the tasks performed as part of a job is compiled, the CIT procedures can be used 
to identify tasks that are preeminent in terms of their frequency, difficulty, and fail¬ 
ure consequence. Clearly, more attention should be devoted in the training process 
to tasks that are difficult, that occur frequently, and that have dire consequence if not 
performed properly than to tasks that are seldom completed, are easy, and have little 
impact. However, until this comprehensive list of the tasks that comprise a job has 
been compiled and each task analyzed, there is no basis for deciding which tasks are 
important tasks and which are insignificant. 
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In addition to examining the job and its constituent tasks, attention must also be 
paid to the eventual recipients of the training. This is referred to as the target audi¬ 
ence: the people who will complete the training and then go on to do the job. Just as 
the design of the hardware must consider the capabilities and limitations of the users, 
so must the training system be designed with the eventual users in mind. During the 
analysis phase, these users must be identified and described in detail, and this infor¬ 
mation will be critical in the design phase. Consider, for example, the importance 
of knowing whether the training system is to be designed to accommodate student 
pilots with no prior flying experience or whether it will be used by experienced com¬ 
mercial pilots who already have mastered the basics. This is an extreme example, 
perhaps, but the point is that designers of training systems cannot make assumptions 
about the characteristics of those who will use the training. 

As a further example, in the international world of aviation, a minimum com¬ 
mand of the English language is required. Training designers would be ill advised to 
assume that all the students participating in a new training course have a satisfactory 
command of English or even that they all have the same level of fluency. Clearly, 
there are substantial national and regional differences in the teaching of English. A 
thorough analysis would therefore assess this issue to determine whether remedial 
instruction in English is required for a specified target audience before the technical 
training may commence. Similar comments could be made with regard to computer 
literacy, experience in driving automobiles, prior mechanical experience, and intel¬ 
lectual level—to name but a few of the many possible examples. All of these prede¬ 
cessor, or enabling, conditions must be identified during the analysis phase. 

5.2.2 Design 

In this phase, the instructional strategies are determined and the instructional meth¬ 
ods and media are selected. One instructional strategy might be not to provide for¬ 
mal training at all. Rather, one might elect to use some sort of apprenticeship and 
have all learning take place as on-the-job training. This is a rather common strategy, 
particularly for jobs requiring lower skill levels. Consider, for example, the training 
program for a carpenter’s helper. This might be as simple as telling the new helper 
to follow the carpenter and do what he or she says to do. Admittedly, this is a rather 
extreme example and not likely to be found in aviation settings. However, a strategy 
that combines some formal training with on-the-job training is fairly common. This 
is particularly true for tasks that are seldom performed. 

The occupational analysis procedures used in the analysis phase may result in the 
identification of some tasks that are rarely performed. These may even be tasks with 
significant consequences, but which occur so infrequently that any training provided 
during initial qualification would be lost by the time it became necessary to per¬ 
form the task. (The issue of skill decay will be discussed in some detail later in this 
chapter.) The instructional system designers are thus placed in a quandary. Do they 
include instruction on the performance of a task that they know will be forgotten, on 
average, long before the task must be performed? Do they rely upon refresher train¬ 
ing to maintain task proficiency? Do they utilize some sort of just-in-time training 



104 


Aviation Psychology and Human Factors 


scheme, so that when the need for the task arises, the incumbent can quickly learn 
the required skills? 

In some instances, the latter approach is satisfactory, particularly when dealing 
with maintenance of highly reliable electronic systems. The reader may reflect upon 
the last (if any) time he or she was called upon to install or replace the disk drive in 
a personal computer. Without the benefit of any technical training, computer own¬ 
ers are called upon to accomplish this task by the manufacturers of computer disk 
drives. They are able to do so (usually without injury to themselves or their per¬ 
sonal computer) due in large part to the well-designed, step-by-step instructions, 
with accompanying graphics, provided by disk manufacturers. The instructions pro¬ 
vide just-in-time training to enable the personal computer owner to accomplish the 
required task. Thus, a person who has never accomplished this task before and has 
received no training can, during the course of the hour or so required to read and 
follow the directions, successfully complete the installation and then promptly forget 
everything just learned until the next occasion arises. At that time, the abbreviated 
training, task performance, and skill decay cycle will be repeated. 

Of course, in some instances, the task demands require that even rarely performed 
tasks be trained to a high level of performance and that frequent refresher training 
be given to maintain skill levels. In aviation, the rejected takeoff (RTO) is a prime 
example of such a task. In such a time-critical situation, task performance must be 
immediate and flawless. During the 2 seconds or so in which the RTO decision must 
be reached and an appropriate action initiated, there is no time available to consult 
even the ubiquitous checklist, let alone pull out the abnormal procedures handbook 
and learn how to perform the task. 

During the SAT/ISD analysis phase, these tasks and their performance con¬ 
straints should have been identified so that appropriate instructional strategies may 
be selected. In each case, the strategy should be appropriate for the task for which 
the student is being trained. 

In addition to the instructional strategy, the instructional developer must also 
select the instructional methods and media during this phase. Some of the instruc¬ 
tional methods include: 

• lecture; 

• demonstration; 

• self-study; 

• computer-based (CB) training; and 

• on-the-job training (OJT). 

The media available include: 

• printed media; 

• overhead transparencies; 

• audio tape recordings; 

• 35 mm slide series; 

• multimedia presentations; 
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• video and film; and 

• interactive courseware. 

The selection of the instructional method and media forms part of the plan of 
instruction developed during this phase. This plan is focused on learning objectives. 
These are statements of what is expected to be accomplished at each stage of the 
training program, and they proceed in such a sequence that at each stage the skills 
and knowledge required for successful completion of the new learning objective 
have already been put into place by the previous learning activities. Therefore, if the 
objective of a certain set of instructions is for the student to complete an instrument 
approach successfully, then at some earlier stage the student must have learned how 
to tune the radio, how to maintain aircraft attitude by reference to the attitude indica¬ 
tor, and how to initiate and control a descent, along with many other skills. Thus, the 
order in which new material is presented is critical to success and must be considered 
carefully in the plan of instruction. The analysis phase will have resulted in the iden¬ 
tification of these predecessor skills and knowledge, and the instructional designer 
must ensure that these restrictions are considered during the design phase. 

5.2.3 Develop 

Once the objectives have been established and the training strategies and activities 
have been planned, it is time to implement the design by creating a formal course 
syllabus, writing lessons, producing the instructional materials, and, if necessary, 
developing interactive courseware. The key document to be created is the plan of 
instruction (POI) or course syllabus. It is this document that serves to control the 
planning, organization, and conduct of the instruction. It is the blueprint for provid¬ 
ing instruction in a course and is used to develop the individual lesson plans used by 
instructors in the delivery of instruction. 

Perhaps the most visible, or at least the most voluminous, product of this phase is 
the actual instructional material. During this phase, the books, pamphlets, student 
guides, videotapes, slides, transparencies, simulators, mock-ups, and everything else 
identified during the design phase and listed in the POI are produced. 

Also constructed during this phase are the tests that will determine whether 
the students have achieved the mastery levels specified in the POI. This is a very 
important component and the development of these tests must adhere to sound 
psychometric principles. Because these issues will be covered at some length in 
Chapter 2 and Chapter 4 of this book, they will not be addressed here, other than 
to note that the care given to tests used to select personnel for pilot training must 
also be applied to the tests that determine whether, at the end of training, they are 
now qualified to be pilots. 

As one final activity in this phase, it is always wise to try out the new training 
program on a limited basis to ensure that everything proceeds according to plan. 
It is said that no battle plan survives the first contact with the enemy. Similarly, 
even in the best planned training program, it is almost inevitable that some things 
will have been overlooked; this will become glaringly obvious when the full course 
is administered to real students. Implicit assumptions may have been made about 
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student capabilities that turn out to be faulty. Estimates of the time or number of 
trials required to learn some new skill to criterion levels may have been too opti¬ 
mistic. Language that seemed perfectly clear to the developers and subject matter 
experts may prove hopelessly confusing to naive students. 

Even though each of the individual sections and training components may have 
been tried out with students as they were being developed, one final test of the entire 
system is prudent. It is here that the effects of dependencies among the training ele¬ 
ments may be revealed that were not evident when each of the elements was evalu¬ 
ated in isolation. Success in this trial provides the trigger to move to the next phase 
with confidence. 

5.2.4 Implement 

It is at this stage that the new training program becomes operational. If all the pre¬ 
ceding stages have been accomplished successfully, then the expected students will 
arrive at the proper locations. The instructional materials will be on hand in sufficient 
quantities, and all hardware, such as simulators, will be operational. The instructors 
and support personnel will be in place and ready to begin instruction. This marks the 
boundary, then, between the largely technical activities of the preceding phase and 
what is now mainly a management or administrative activity. 

5.2.5 Evaluate 

When conducted properly, evaluation is ongoing during the SAT/ISD process. Each 
of the preceding phases should include some sort of evaluation component. For 
example, during the analysis phase, some method is needed to ensure that all the 
tasks comprising the job have been included in the analysis and that appropriate, 
rigorous, task analytic methods have been applied to ensure a quality result. During 
the design phase, the selection of methods and media should be evaluated against 
the learning objectives to ensure that they are appropriate. During the development 
phase, the instructional material being created must be checked carefully for valid¬ 
ity. For example, the radio phraseology that is taught during flight training must 
be checked against the phraseology prescribed by the civil or military authority 
because departures from standard phraseology lead to confusion and errors. Finally, 
at the implementation phase, several evaluation components, such as the quality of 
the graduates, are possible. Overall, these evaluation activities can be divided into 
three general types: 

• formative evaluation; 

• summative evaluation; and 

• operational evaluation. 

The formative evaluation process extends from the initial SAT/ISD planning 
through the small-group tryout. The purpose is to check on the design of the indi¬ 
vidual components comprising the instructional system. It answers the question, 
“Have we done what we planned to do?” That is, if the plan for achieving a specified 
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learning objective called for the use of a partial simulation in which the student would 
be taught to operate the flight management system (FMS) to a specified level of com¬ 
petence, did that actually occur? Does the instructional system include the use of 
the partial simulation to achieve that particular learning objective? Do students who 
complete this particular training component demonstrate the level of competence 
required? 

When this sort of evaluation is conducted early on, it allows the training develop¬ 
ers to improve the training program while the system is still being developed, and 
changes can be made for the least cost. For example, discovery that students com¬ 
pleting the FMS training cannot perform all the tasks to a satisfactory level could 
lead to changes in the training design to modify the training content or to provide 
additional time for practice on the simulator. However, if this deficiency were not 
discovered until late in the training development process, then the relatively simple 
and inexpensive changes might no longer be possible. For example, the simulator 
might have been scheduled for other training components, so additional practice on 
the existing simulators is no longer a possibility. This means that either major change 
must be made to the training program or additional simulation assets must be pur¬ 
chased. Both of these alternatives have negative cost implications. 

The summative evaluation involves trying out the instructional program in an oper¬ 
ational setting on students from the target population. The basic question answered 
by this evaluation is “Does the system work?’’ That is, does the instructional system 
work under operational conditions? Typically, the summative evaluation examines 
the training program during the operational tryout of two or three classes. This pro¬ 
vides enough data to identify such issues as lack of adequate resources, changes 
needed to the schedules, inadequacies of the support equipment, need for additional 
training of instructors, or modifications to the training materials to improve clarity. 
The summative evaluation also addresses the key question of graduate performance. 
That is, it should show whether the graduates of the training program can perform 
the jobs for which they are being trained. Clearly, just as the graduates of a medical 
college should be able to pass the medical licensing examination, the graduates of 
a pilot training school should be able to pass the licensing authority’s written and 
practical tests. If a significant number of graduates cannot meet these standards, then 
there is almost certainly something wrong in the training school. 

Operational evaluation is a continuous process that should be put into place when 
the new training system is placed into operation. This form of evaluation is used 
to gather and analyze internal and external feedback data to allow management to 
monitor the status of the program. At a minimum, it should provide for monitoring 
of the status of graduates of the training course to ensure that they continue to meet 
job performance requirements. Changes in the proportion of graduates who pass 
licensing examinations, for example, should signal the need to reevaluate the train¬ 
ing program. Perhaps some changes have taken place in the operational environment 
that dictate changes in the training program. 

In an aviation context, the use of global positioning system (GPS) navigation pro¬ 
vides such an example. As GPS begins to supplement or eventually even replace very 
high frequency omnidirectional radio (VOR) navigation, pilot training programs 
must be modified to provide instructions on GPS navigation techniques—perhaps 
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at the cost of decreased instruction on VOR navigation. An alert training institu¬ 
tion might make these changes proactively. However, the use of an ongoing opera¬ 
tional evaluation component, particularly one that examined more than gross pass/ 
fail rates, should signal the need for change, even if the changing technology were 
not otherwise noticed. 

5.2.6 Conclusion 

The use of SAT/ISD provides a framework for the development of training. It does 
not prescribe methods or modes—a feature for which it is sometimes criticized, but 
which others argue is a strength of the technique. Its use is ubiquitous in the military, 
including military aviation. Hence, some familiarity with its basic concepts is desir¬ 
able, even for those who may not be directly involved in the development of a large- 
scale training program. An adherence to its general precepts and procedures will 
inform and guide any training development, even at a modest level. The casual flight 
instructor may not utilize a formal SAT/ISD process in planning how he or she will 
teach student pilots; but, at least some consideration of learning objectives, choice 
of teaching modes, evaluation, and the other SAT/ISD components would almost 
certainly result in an improved training delivery and a better trained student. 


Cockpit/crew resource management (CRM): The effective use of all available 
resources—people, weapon systems, facilities, and equipment, and environ¬ 
ment—by individuals or crews to safely and efficiently accomplish an assigned 
mission or task (definition from U.S. Air Force 2001). 


5.3 CREW RESOURCE MANAGEMENT 

The preceding section has dealt in some length with a generalized system for devel¬ 
oping training, without regard for the specific elements to be trained. Typically, 
training is thought of in conjunction with the technical skills of operating an air¬ 
craft. These include such skills as reading a map, reading and interpreting weather 
forecasts, accurately calculating weight and balance, and proper movement of the 
controls so as to accomplish a desired maneuver. However, other skills are also 
valuable. These skills are typically referred to under the general heading of cockpit 
resource management (CRM), although in Europe they may also be termed non¬ 
technical skills (NOTECHS). This area includes such things as getting along with 
crew members, knowing when and how to assert one’s self effectively in critical 
situations, and maintaining situational awareness. For the most part, training in 
CRM presupposes competency in all the technical skills required to operate an 
aircraft. However, a well-developed training program for ab initio pilots, particu¬ 
larly one that has been developed in accordance with the precepts of SAT/ISD, 
may well include CRM as a stand-alone element or as part of the technical skill 
training. 
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Ab initio: Latin for “from the beginning." This refers to individuals with no 
previous experience or training in the relevant subject matter. 


Situation awareness: “Situation awareness is the perception of the elements 
in the environment within a volume of time and space, the comprehension of 
their meaning and the projection of their status in the near future” (Endsley 
1988, p. 97). 


The need for CRM training was identified in the 1980s as a result of studies of 
the causes of predominantly civilian airliner accidents. Christian and Morgan (1987) 
reviewed the causes of aircraft accidents and found that the following human factors 
contributed to mishaps: 

• preoccupation with minor mechanical irregularities; 

• inadequate leadership and monitoring; 

• failure to delegate tasks and assign responsibilities; 

• failure to set priorities; 

• failure to communicate intent and plans; 

• failure to utilize available data; and 

• failure to monitor other crew members in the cockpit adequately. 

Perhaps the seminal article on this subject, however, is that of Foushee (1984), 
in which he applied the techniques and terminology of social psychology to the 
environment of an air carrier cockpit. From that point, interest in CRM has mush¬ 
roomed. The evolution of CRM in commercial aviation is documented by Helmreich, 
Merritt, and Wilhelm (1999). However, a vast literature on this subject now extends 
beyond aviation to such settings as the hospital operating theater (cf. Fletcher et al. 
2003) and the control rooms of off-shore drilling platforms (cf. Salas, Bowers, and 
Edens 2001). 

A major proponent of CRM has been the Federal Aviation Administration (FAA) 
in the United States. The FAA has sponsored an extensive program of research on 
CRM, most notably by Robert Helmreich and his colleagues at the University of 
Texas. Based on the results of the research of Helmreich and many others, the FAA 
has produced publications that provide guidance and definitions to air carriers and 
others regarding the desired characteristics of CRM training. According to the CRM 
advisory circular produced by the FAA (2004, p. 2): 

CRM training is one way of addressing the challenge of optimizing the human/machine 

interface and accompanying interpersonal activities. These activities include team 

building and maintenance, information transfer, problem solving, decision-making, 

maintaining situation awareness, and dealing with automated systems. CRM training 
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is comprised of three components: initial indoctrination/awareness, recurrent practice 
and feedback, and continual reinforcement. 

That same FAA advisory circular provides a listing of the characteristics of effec¬ 
tive CRM: 

• CRM is a comprehensive system of applying human factors concepts to improve 
crew performance. 

• CRM embraces all operational personnel. 

• CRM can be blended into all forms of aircrew training. 

• CRM concentrates on crew members’ attitudes and behaviors and their impact 
on safety. 

• CRM uses the crew as the unit of training. 

• CRM is training that requires the active participation of all crew members. It 
provides an opportunity for individuals and crews to examine their own behav¬ 
ior and to make decisions on how to improve cockpit teamwork. 


5.3.1 Generations of CRM 

In their review of the evolution of CRM, Helmreich et al. (1999) identified five dis¬ 
tinct generations of CRM, beginning with the first comprehensive CRM program 
begun by United Airlines in 1981. These authors (p. 20) characterize first-generation 
courses as “psychological in nature, with a heavy focus on psychological testing and 
such general concepts as leadership.” Perhaps not surprisingly, some pilots resisted 
some of these courses as attempts to manipulate their personalities. 

The second generation of CRM courses is characterized as having more of a focus 
on specific aviation concepts related to flight operations. In addition, the training 
became more modular and more team oriented. This is reflected in the change of 
names from “cockpit resource management” in the first generation to “crew resource 
management.” This training featured much more emphasis on team building, brief¬ 
ing strategies, situation awareness, and stress management. Although participant 
acceptance was greater for these courses, some criticism of the training and its use of 
psychological jargon remained. Helmreich et al. note that second-generation courses 
continue to be used in the United States and elsewhere. 

In the third generation of CRM courses, beginning in the early 1990s, the 
training began to reflect more accurately and holistically the environment in 
which the aircrews operate. Thus, organizational factors such as organization 
culture and climate began to be included. This may reflect to some degree the 
writings by Reason (1990) at about that time, in which he described the multiple- 
layer concept of accident causality. In that model (to be described in more detail 
in Chapter 8 on safety), organizational factors are clearly identified as possible 
contributors to accidents. 

The initiation of the advanced qualification program (AQP; Birnbach and 
Longridge 1993; Mangold and Neumeister 1995) by the FAA in 1990 marked the 
beginning of the integration and proceduralization of CRM that define the fourth 
generation. AQP allows air carriers to develop innovative training to meet their par¬ 
ticular needs. However, the FAA requires that both CRM and line-oriented flight 
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training (LOFT) be included as part of the AQP for an air carrier. Because AQP 
provided an air carrier advantages in terms of customizing training so as to reduce 
costs while maintaining a satisfactory product, most air carriers have adopted the 
program. Accordingly, they have also developed comprehensive CRM programs 
based on detailed analyses of training requirements and human factors issues. 

Finally, Helmreich et al. suggest that a fifth generation of CRM training may 
be characterized by an explicit focus on error management. They propose that the 
ultimate purpose of CRM, which should be reflected in the training syllabus and the 
training exercises, is to develop effective means to manage risks. This philosophy 
reflects the Reason (1990, 1997) model of accident causality referred to earlier, in 
which it is acknowledged that perfect defenses against accidents do not exist. That 
is, despite the best laid plans and best designed systems, there will inevitably be 
failures, mistakes, and errors. It is prudent, therefore, to train to expect, recognize, 
and manage those risks. 

Thus, CRM has progressed from an early emphasis on teaching good interper¬ 
sonal relations to its current focus on effectively utilizing all the resources at the 
disposal of a flight crew (other flight-deck crew, cabin attendants, dispatchers, air 
traffic control, ground maintenance staff, etc.) to manage risk. In making this transi¬ 
tion, CRM has moved from a program with sometimes only vaguely defined goals 
to a more sharply defined program with well-defined behavioral markers and well- 
established assessment procedures. Even so, the debate over whether CRM works 
continues. That is, does CRM actually result in improved safety? 

5.3.2 Evaluation of CRM Effectiveness 

The effectiveness of CRM has been debated in the scientific literature and on the 
flight deck since its inception. As noted previously, some early participants in the 
training resisted it because of its predominantly psychological nature, which they 
perceived as an attempt to manipulate their personalities. Changes in the training 
format and, to some extent, its content have largely eliminated these criticisms. 
However, even training that is well received and liked by participants may or may 
not achieve the desired effect. 

The effectiveness of CRM has been investigated by many researchers, and it has 
been summarized in a study by Salas and his associates (2001) at the University of 
Central Florida. These researchers reviewed 58 published studies of CRM training 
in an aviation setting to establish its effectiveness. They used Kirkpatrick’s (1976) 
typology for training evaluation as their framework to evaluate the effectiveness 
of the CRM training. The Kirkpatrick typology organizes the data obtained after 
training into the categories of reactions, learning, behaviors, and results (impact on 
organization). Of these categories, reactions to training are the easiest to collect and 
typically consist of the responses of participants to Likert scale statements such as “I 
found the training interesting.” Participants indicate the strength of their agreement 
or disagreement with the statement by choosing one of (typically) five alternatives: 
strongly agree, agree, undecided, disagree, or strongly disagree. Collections of such 
questions can be analyzed to assess the reactions of the participants to the training 
for any of several dimensions (e.g., interest, relevance, effectiveness, utility, etc.). 
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Of the 58 studies reviewed by Salas et al., 27 involved the collection of reaction 
data. Their results showed that participants generally liked the CRM training and 
that the training that utilized role play was better liked than the lecture-based train¬ 
ing. In addition, participants also felt that CRM training was worthwhile, useful, 
and applicable. 

Assessments at the second level (learning) have also shown that CRM gener¬ 
ally has a positive impact. This has been evidenced in studies of changes in both 
attitudes and knowledge. Attitudes toward CRM (specifically a positive attitude 
regarding CRM training) have been shown to be more positive following the train¬ 
ing. In addition, increased knowledge of human factors issues, crew performance, 
stressors, and methods of dealing with stressors have also been observed following 
CRM training. 

Salas et al. found that 32 studies included some assessment of behavioral change 
following CRM training. Most commonly, this was assessed through the measure¬ 
ment of CRM-related behaviors while participants engaged in a simulated flight, 
although in some instances (11 out of 32) an online assessment of behavior was used. 
Most of these studies showed that CRM training had a positive impact on behavior in 
that CRM-trained crews exhibited better decision making, mission analysis, adapt¬ 
ability, situation awareness, communication, and leadership. These findings provide 
strong support for the impact of CRM on crew behavior. 

Although the studies of crew behavior indicate that CRM training has an impact, 
the effects of those changes in behavior have yet to be demonstrated clearly at an 
organizational level. Only six studies collected some form of evaluation data at this 
level, and Salas et al. (p. 651) noted that “the predominant type of evidence that 
has been used to illustrate CRM’s impact on aviation safety consists of anecdotal 
reports....” Thus, a clear relationship between CRM and the desired outcome of 
increased safety and corresponding decrease in accidents has yet to be demonstrated. 
It is not surprising, then, that Salas and colleagues end their review by calling for 
more and better evaluations to assess the safety impact of CRM training. 

That additional evaluations of CRM in terms of its method of implementation 
and in terms of its content are needed is made evident by continuing evidence of 
CRM-related contributions to accidents. In a military context, Wilson-Donnelly and 
Shappell (2004) reported on a study whose objective was to determine which of 
the CRM skills included in the U.S. Navy CRM training program matched CRM 
failures identified in naval aviation accidents. The following seven critical skills of 
CRM were included in U.S. Navy training: 

• decision making; 

• assertiveness; 

• mission analysis; 

• communication; 

• leadership; 

• adaptability/flexibility; and 

• situational awareness. 
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Of the 275 Navy/Marine Corps accidents involving some CRM failure during 
the period 1990-2000, lack of communication was identified as the number one 
CRM failure, occurring in over 30% of the accidents. Inadequate briefing was the 
second most prevalent CRM failure, occurring in slightly over 20% of the accidents. 
A previous study (Wiegmann and Shappell 1999) had found that CRM failures con¬ 
tributed to more than half of all major (Navy Class A) accidents, so this suggests 
that failures associated with the materials being specifically addressed in the Navy/ 
Marine Corps CRM training continue to be a major factor in accidents in the Navy/ 
Marine Corps. Wilson-Donnelly and Shappell recognized the unsatisfactory nature 
of this situation, but suggested that before the current CRM training program is 
modified or scrapped, similar analyses should be conducted of civilian accident data 
to see if the same situation holds there. 

Based on these analyses and investigations of the contributing factors in civilian 
accidents, it seems clear that not all crew resource management issues have been suc¬ 
cessfully resolved by CRM training, at least in its current implementation. 


Training transfer is the “extent to which the learned behavior from the training 
program is used on the job” (Phillips 1991). 


5.4 SIMULATOR TRAINING 

Every military pilot, every airline pilot, and many instrument-rated private pilots 
have had some exposure to flight simulators. They are used for a number of rea¬ 
sons, not the least of which is the cost savings they provide over in-flight instruction. 
For example, the U.S. Air Force estimates that an hour of training in a C-5 aircraft 
costs $10,000 while an hour in a C-5 simulator is $500 (Moorman, 2002, as cited in 
Johnson 2005). 

Particularly for the military and air carrier pilots, simulators also have the great 
advantage of allowing pilots to practice maneuvers and respond to events that are far 
too hazardous to practice in a real aircraft. The rejected takeoff decision due to loss 
of an engine on takeoff is an obvious example from civil aviation. 

However, the utility of simulation for training is not a given. It has to be estab¬ 
lished that the training provided on a device will influence behavior in the real envi¬ 
ronment. Will what is learned in the simulator carry over into the aircraft? This is 
the question usually referred to as transfer of training, and it has been the subject of 
much research. Clearly, the preponderance of results shows that what pilots learn in 
a simulator is carried over to the aircraft. 

Two major reviews of the transfer of training effectiveness in flight simulation 
have been conducted. The first study (Hays et al. 1992) reviewed the pilot training 
literature from 1957 to 1986. Using meta-analysis (see the discussion of this statis¬ 
tical technique in Chapter 2), Hays et al. found that simulators consistently led to 
improved training effectiveness for jet pilots, relative to training in the aircraft only. 
However, the same results were not found for helicopter pilots. 
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In the second major review, Carretta and Dunlap (1998) reviewed the studies 
conducted from 1987 to 1997. In particular, they focused on landing skills, radial 
bombing accuracy, and instrument and flight control. In all three of these areas, 
Carretta and Dunlap concluded that simulators had been shown to be useful for 
training pilot skills. 

In one of the studies of landing skills cited by Carretta and Dunlap (1998), Lintern 
and colleagues (1990) examined the transfer of landing skills from a flight simulator to 
an aircraft in early flight training. They compared one group of pilots who were given 
two sessions of practice on landings in a simulator prior to the start of flight training to 
a control group that was given no practice prior to the start of training in the aircraft. 
They found that the experimental group that had received the 2 hours of simulator 
training required 1.5 fewer hours prior to solo than the control group. For this group, 2 
hours of simulator time were equivalent to 1.5 hours of aircraft time. Comparisons of 
this sort are usually given in terms of the transfer effectiveness ratio (TER). 

Originated by Roscoe (1980), the TER expresses the degree to which hours in the 
simulator replace hours in the aircraft and is defined as 

TER = (control group time - experimental group time)/time of total training 

For example, if private pilot training normally requires 50 flight hours, and the use of a 
10-hour simulator training program reduces the requirement to 40 in-flight hours, then 

TER = (50 - 40)/10 = 1 

In other words, 1 hour of simulator times saves 1 hour of flight time. 

For a more realistic example, if the simulator training took 10 hours, and 45 addi¬ 
tional hours of in-flight training were required, then 

TER = (50 - 45)/10 = 0.5 

This is interpreted to mean that 1 hour of simulation saves 0.5 hour of flight time. 

Simulators are widely used for training instrument skills, and Carretta and 
Dunlap (1998) concluded that “simulators provide an effective means to train instru¬ 
ment procedures and flight control” (p. 4). They cite Pfeiffer, Horey, and Butrimas 
(1991), who found a correlation of r = .98 between simulator performance and actual 
flight performance. 

Flight simulators can vary substantially in their fidelity to the actual flight envi¬ 
ronment in terms of motion, control dynamics, visual scene, and instrumentation. 
Some simulators are designed for whole-task training, whereas others are designed 
as part-task trainers (e.g., intended only to train students in the use of the flight man¬ 
agement system or the aircraft pressurization system). In general, the reviews such as 
Carretta and Dunlap’s (1998) have shown that high-fidelity simulators are not neces¬ 
sary for successful transfer of training. 

Vaden and Hall (2005) conducted a meta-analysis to examine the true mean effect 
for simulator motion with respect to fixed-wing training transfer. Working with a 
rather small sample of only seven studies, they found a small (d = 0.16) positive effect 
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for motion. (For an explanation of effect size, “d,” see Chapter 2 on statistics.) Thus, 
although their study shows that motion does promote a greater transfer of training, 
the relatively small effect may not be worth the considerable expense that motion- 
based simulators entail, over and above a fixed-based simulator cost. Indeed, the 
ultimate reason for using a motion-based simulator may not be for greater transfer of 
training but rather for decreased simulator sickness. (For a review of simulator sick¬ 
ness research, see Johnson 2005). 

In four quasi-experiments, Stewart, Dohme, and Nullmeyer (2002) investigated 
the potential of simulators to replace a portion of the primary phase of U.S. Army 
rotary-wing training. In their studies, positive TERs were observed for most flight 
maneuvers from the simulator to the UH-1 training helicopter. Generally, student 
pilots who received simulator training required less training to reach proficiency on 
flight maneuvers than controls did. For example, the TERs for the maneuver “takeoff 
to hover” ranged from 0.18 to 0.32 for the four experiments, while TERs for “land 
from hover” ranged from 0.25 to 0.72. Because the simulator was undergoing con¬ 
tinual refinement during the course of the four experiments, Stewart et al. were able 
to show that improvements in the visual scene and aerodynamic flight model could 
result in improvements to the TER. 

In a subsequent study (Stewart and Dohme 2005), the use of an automated hover 
trainer was investigated. This system utilized the same simulator as was used in the 
Stewart et al. (2002) series of experiments and incorporated a high-quality visual 
display system. A simple two-group design was used; 16 pilot trainees received the 
experimental training and 30 trainees served as controls. For the five hovering tasks 
that were practiced in the simulator, no instances of negative transfer of training 
occurred. In all cases, fewer iterations of the tasks were required for the simulator- 
trained subjects than for the control subjects. Stewart and Dohme conclude that these 
results show the potential for simulation-based training in traditionally aircraft-based 
primary contact tasks, in addition to its traditional role in instrument training. 

In a similar experiment (Macchiarella, Arban, and Doherty 2006) using civilian 
student pilots at Embry-Riddle University, 20 student pilots received their initial 
training in a Frasca flight training device (FTD) configured to match the Cessna 172S 
in which 16 control subjects were trained. Positive TERs were obtained for 33 out of 
34 tasks in this study. This is particularly interesting because the Frasca, in contrast 
to the simulators used in the Army studies, is a nonmotion-based simulator. 

5.5 TRAINING USING PERSONAL COMPUTERS 

In contrast to traditional simulators that generally replicate to a fair degree of realism 
the flight deck, instruments, and controls of a particular aircraft, several recent studies 
have assessed the use of personal computers for training. Although some studies of 
crude personal computer-based training devices (by current standards) were conducted 
as early as the 1970s, the major impetus for this work began in the early 1990s with the 
work of Taylor and his associates at the University of Illinois at Urbana-Champaign. 

As with traditional simulators, the initial studies were primarily concerned with 
the use of personal computer aviation training devices (PCATDs) for the training 
of instrument skills. Taylor et al. (1999) evaluated the extent to which a PCATD 
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could be used to teach instrument tasks and the subsequent transfer of those skills 
to an aircraft. They constructed a PCATD out of commercially available software 
and hardware and administered portions of two university-level aviation courses to 
students. Following instruction in the PCATD, the students received instruction and 
evaluation in an aircraft. TERs ranged from a high of 0.28 to a low of 0.12, depend¬ 
ing on the specific task. An interesting finding from this study was that the PCATD 
was more effective for the introduction of new tasks than for the review of tasks 
previously learned to criterion level. 

In a subsequent study, Taylor and colleagues (2001) demonstrated again that these 
devices can be used successfully to teach instrument skills, with an overall TER of 
0.15, or a savings of about 1.5 flight hours for each 10 hours of PC-based training. 
An earlier study by Ortiz (1995) demonstrated a TER = 0.48 for PC-based training 
of instrument skills to students with no previous piloting experience. 

In addition to their use in acquiring initial instrument flight skills, PCATDs 
might also be of use in maintaining those skills. As we note in the following section, 
complex cognitive skills are subject to loss over extended periods with no practice. 
Whether PCATDs could provide that practice was the question addressed by Talleur 
et al. (2003) in their study of 106 instrument-rated pilots They randomly assigned 
pilots to one of four groups who, following an initial instrument proficiency check 
(IPC) flight in an aircraft, received training at 2 and 4 months in (1) an aircraft, (2) 
an FTD, (3) a PCATD, or (4) none (control group). At the 6-month point, all groups 
then received another IPC in an aircraft. 

By comparing the performance of these four groups on the final IPC, these 
researchers were able to demonstrate that the PCATD was effective for maintain¬ 
ing instrument currency. The pilots who trained on the PCATD during the 6-month 
period performed as well as those who trained on the FTD. Furthermore, both groups 
performed at least as well as those who trained in the aircraft. An additional inter¬ 
esting finding from this study was that, of the legally instrument-current pilots who 
entered this study, only 42.5% were able to pass the initial IPC in the aircraft. 

Although PCATDs have been shown to be useful in the training of instrument 
skills, they have been less successful in dealing with manual or psychomotor skills. 
Two studies have failed to find transfer of manual flying skills from the PCATD to 
straight-and-level flight (Dennis and Harris 1998) or to aerobatic flight (Roessingh 
2005). However, the use of PC-based systems to teach teamwork skills has been suc¬ 
cessfully demonstrated in a study of U.S. Navy pilots (Brannick, Prince, and Salas 
2005). In that study, in a scenario executed in a high-fidelity flight simulator, pilots 
who received training in CRM on a PC later demonstrated better performance com¬ 
pared to pilots who received only problem-solving exercises and video games such 
as those that have been used in commercial CRM training. 

5.6 RECURRENT TRAINING AND SKILL DECAY 

The purpose of recurrent or refresher training is to maintain skills that have been 
acquired through some initial training process. Obviously, if humans never forgot 
anything and if motor skills could be maintained at a high level of performance 
indefinitely without practice, recurrent training would not be necessary. Sadly, 
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humans forget. In fact, they forget many things rather rapidly. Even though it is said 
that once one learns how to ride a bicycle, one will never forget, most people will 
notice a decline in their bicycling proficiency after a long period of inactivity. A per¬ 
son may still be able to ride the bicycle, but should not try anything fancy. 

Nevertheless, some skills decay faster than others. Memories and cognitive skills 
tend to be lost much faster than motor skills. Thus, a pilot coming back to avia¬ 
tion after a long absence may find that he or she can still fly the plane, but cannot 
remember how to get taxi and takeoff clearance and does not have a clue as to how 
to compute density altitude. This phenomenon is known as skill decay. Skill decay 
“refers to the loss or decay of trained or acquired skills (or knowledge) after periods 
of nonuse” (Arthur et al. 1998, p. 58). 

Difficulties associated with skill decay are exacerbated by the current genera¬ 
tion of cockpit automation that tends to place pilots in a passive, monitoring mode. 
However, at the time something fails, pilots must immediately take positive control 
of the aircraft and, in some instances, perform tasks that they have only infrequently 
or never been trained to accomplish. Amalberti and Wibaux (1995) point out that in 
many cases manual procedures are no longer taught. The example they cite is the 
use of the brake system during rejected takeoff on the Airbus A320. In that aircraft, 
use of the auto brake is mandatory. Hence, pilots are not trained to use the manual 
brake. It is interesting to speculate what will happen when, as all things must, the 
auto brake system fails, and manual braking must be used. 

Prophet (1976) conducted an extensive review of the literature on the long-term 
retention of flying skills. His review covered some 120 sources for which abstracts 
or annotations were available, predominantly from military sources. His results sug¬ 
gest basic flight skills can be retained fairly well for extended periods of not flying. 
However, significant decrement occurs, particularly for instrument and procedural 
skills. He notes a consistent finding that continuous control (i.e., tracking) skills are 
retained better than the skills involved in the execution of discrete procedures. One 
such example is the finding by Wright (1973) that basic visual flight skills remained 
generally acceptable for up to 36 months, while instrument flight skills fell below 
acceptable levels within 12 months for about half of the pilots. 

In a controlled study of the retention of flying skills (Childs, Spears, and 
Prophet 1983), a group of 42 employees of the Federal Aviation Administration 
received training necessary to qualify them for the private pilot certificate. Using 
a standardized assessment procedure, their proficiency was then reassessed 8, 
16, and 24 months following award of their certificates. The authors reported a 
decline in the mean percentage of correctly performed measures, beginning with 
90% and declining steadily to approximately 50% at the 24-month check. They 
concluded that “recently certificated private pilots who do not fly regularly can 
be expected to undergo a relatively rapid and significant decrement in their flight 
skills” (p. 41). 

Childs and Spears (1986) reviewed the studies dealing with the problem of flight- 
skill decay. They suggested that cognitive/procedural skills are more prone than 
control-oriented skills to decay over periods of disuse. 

Casner, Heraldez, and Jones (2006) examined pilots’ retention of aeronautical 
knowledge in a series of four experiments in an attempt to discover characteristics 
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of the pilots and their flying experiences that influence remembering and forgetting. 
They used questions from the FAA private pilot written examination. 

In the first experiment, the average score for the 10-item multiple choice test was 
74.8%. Of the 60 pilot participants, 12 had scores in the range of 30-60%, substan¬ 
tially below the minimum score (70%) required to pass the FAA certification exami¬ 
nation. Of the 20 pilots who held a private pilot license and were not pursuing any 
more advanced rating, the average score was 69.5%. The national average score for 
the FAA private pilot written examination is 85%, so clearly some substantial forget¬ 
ting of the material had occurred. The 20 certified flight instructors (CFIs), however, 
had an average score of 79%, suggesting that they rehearsed their knowledge more 
often than the other pilots did. 

Although little correlation was observed between the test scores and total flight 
time, significant correlations were obtained between recent flight experience (previ¬ 
ous 3 and 6 months). Most of this association may be attributed, however, to the 
strong correlations obtained for the CFIs (r = .34 and r = .52 for 3 and 6 months, 
respectively). 

In their second experiment, Casner et al. asked 24 active pilots who had flown 
only one make and model of aircraft to perform weight and balance calculations for 
that aircraft and for another aircraft in which they had no prior experience. They 
found that whereas the pilots retained the knowledge of how to make the compu¬ 
tations for their own aircraft, they performed considerably more poorly with the 
unfamiliar aircraft. Specifically, they were able to recognize a “no go” situation only 
50% of the time. 

Casner et al. concluded that “the certificates and ratings held by pilots have little 
influence on how well those pilots retain what they have learned during training” 
(p. 93). They suggest that there is a need for more explicit standards for ongoing 
aeronautical knowledge proficiency as well as some alternative methods for ensur¬ 
ing that pilots maintain their knowledge. In addition, they suggest that the current 
practices of aviation education—specifically the emphasis on abstract facts, remote 
from practical application—may be implicated in this failure to retain important 
aeronautical knowledge. 

Arthur et al. (1998) used meta-analytic techniques to review the skill retention 
and skill decay literature. They noted that the skill decay literature has identified 
several factors that are associated with the decay or retention of trained skills. The 
most important of those factors were 

• the length of the retention interval—longer intervals produce more decay; 

• the degree of overlearning—overlearning aids retention and the amount of 
overlearning is the single most important determinant of retention; and 

• task characteristics—open-loop tasks like tracking and problem solving are 
better retained, compared to closed-loop tasks, such as preflight checks. 

From their analysis of 53 articles, Arthur and colleagues found that “physical, natu¬ 
ral, and speed-based tasks were less susceptible to skill loss than cognitive, artificial, 
and accuracy-based tasks” (1998, p. 85). 
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These conclusions are reflected in the findings of a study on the forgetting of 
instrument flying skills (Mengelkoch, Adams, and Gainer 1971). In that study, two 
groups of 13 subjects with no prior flight experience were given academic training 
and instrument flying instruction in a simulator. One group received 5 training trials 
while the other group was given 10 training trials. Both groups were able to complete 
a program of maneuvers and flight procedures successfully at the conclusion of the 
training, although the group with the additional training performed substantially bet¬ 
ter than the other group (95% and 78% correct performance for the 10-trial and 5-trial 
groups, respectively). After a 4-month interval, the 10-trial group had a 16.5% loss in 
performance of procedures, while the 5-trial group had a 20.1% loss. Retention loss of 
the flight control parameters was generally much less, with only altitude and airspeed 
suffering a significant loss over the retention interval for both groups. 

In a study by Ruffner and Bickley (1985), 79 U.S. Army aviators participated in 
a 6-month test period in which they flew zero, two, four, or six contact and terrain 
flight tasks in the UH-1 aircraft. Their results indicated that average level of perfor¬ 
mance in helicopter contact and terrain flight tasks was maintained after 6 months 
of no practice. Further, intervening practice flights (up to six) did not significantly 
improve the average level of performance. These findings were true regardless of 
total career flight hours or whether the tasks were psychomotor or procedural. 

5.7 CONCLUDING REMARKS 

Training is a vast subject that draws upon many disciplines and scientific and techni¬ 
cal traditions, from the learning theorists to the computer scientists and simulator 
engineers. For the most part, our current training programs produce graduates with 
the skills necessary to be good pilots. Research today is largely concerned with work¬ 
ing at the edges to improve efficiency, reduce costs, and refine training content. At 
the risk of doing a severe injustice to the extensive body of work that we have touched 
upon, sometimes only briefly, in this chapter, let us offer the following summary: 

• A well-designed pilot training system requires analysis and careful plan¬ 
ning in order to produce a quality product. 

• CRM has been a topic of debate for several years, with no clear resolution in 
sight. Whether it actually results in improved safety is still open to question. 

• Simulators are unquestionably valuable tools in aviation training and have 
consistently been shown to have a positive TER, in addition to saving money 
and allowing us to train for hazardous situations safely. 

• Personal computers have made their way into aviation training. There 
seems to be little doubt that they can be used successfully for both initial 
and refresher instrument training. Whether they can also be used for con¬ 
tact training remains to be seen. 

Finally, before we leave the subject of training, let us make a final observation 
about the current practice of aviation training. In particular, we are concerned with 
the pervasive practice of teaching subjects, such as meteorology, in a decontextualized 
manner. That is, the typical aviation training school offers a course on meteorology 
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in which students learn the names of all the clouds, memorize the symbols that 
indicate wind speed and direction from the meteorological charts, and learn how to 
decipher the abbreviations contained in the METAR/TAF reports. Usually, all this 
takes place without any reference to the context in which such knowledge would be 
useful and applied. Hence, students memorize the content without learning how to 
apply that knowledge when planning and conducting flights. Nor do they learn to 
appreciate the significance of the information they are learning from an operational 
or safety standpoint. The need for instruction to take place within the context of its 
application has been discussed cogently by Lintern (1995), who refers to this concept 
as situated instruction. 

This chapter has dealt at length with the rational processes of training system 
design exemplified by the SAT/ISD process; however, as Lintern (1995) notes, this 
process can have the effect of removing learning from the context in which it is 
to be applied. It is well to keep in mind that, eventually, the pilot must integrate 
all that he or she has learned in order to conduct a flight successfully and safely. 
Overcompartmentalization and a rote-learning approach to instruction, even if the 
instruction includes all the separate skill elements, may not provide the goal of 
allowing the pilot to generalize from the classroom setting to the flight deck at the 
time at which the knowledge must be applied. Training must always be planned and 
conducted with this ultimate goal in mind. 
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0 Stress, Human Reactions, 
and Performance 


6.1 INTRODUCTION 

An important part of psychology is the study of variations in how we think, feel, and 
react. Although it is important to be aware of such variations, there are a number of 
commonly shared patterns in terms of reactions—for example, to dramatic events. 
Hence, this chapter discusses both individual differences and common traits in reac¬ 
tions to everyday stress and more significant incidents. Also, this chapter investigates 
common psychological reactions in passengers. 

6.2 PERSONALITY 

Personality is a sweeping construct. It may be defined broadly as every internal 
factor that contributes to consistent behavior in different situations or, narrowly, as 
encompassing only emotions and motivation. A broader definition of personality 
may include intelligence; traditionally, however, personality has been considered 
separate from intelligence and skills. This distinction is evident in psychological 
tests, which are usually divided into ability tests and personality tests. Ability tests 
often include time constraints, and the objective is to get as many correct answers as 
possible, whereas personality tests seek to measure typical response patterns—that 
is, how an individual usually reacts to a given situation. 

For a long time, the psychology community has been engaged in discussion on 
how many personality traits or dimensions are necessary to describe someone. 
Imagine describing a long-time friend. Which adjectives and examples should be 
used? Perhaps the words “great,” “friendly,” “humorous,” or “reliable” spring to 
mind. If one were to describe a person of whom one was less fond, perhaps words 
such as “aggressive,” “cynical,” “egotistical,” or “prejudicial” would be used. If such 
descriptions are collected and systematized using factor analysis (see Chapter 2), 
five general categories emerge. These five factors are normally referred to as “the 
big five”: extroversion , agreeableness, conscientiousness, neuroticism, and open¬ 
ness to experience (Costa and McCrae 1997). Some measures use emotional stability 
instead of neuroticism. In other words, the positive end of the scale is applied. Refer 
to Table 6.1 for examples of characteristics that lead to high or low scores for the 
different dimensions. 

Most techniques for personality characteristic measurements use statements 
combined with a point scale ranging from one to five (or, in some cases, one to 
seven) to which subjects note their level of agreement. Combinations of positively 
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TABLE 6.1 

Overview of Traits Included in the Five-Factor Personality Model 


Characteristics/Traits 

Extroversion 

Agreeableness 

Conscientiousness 

Neuroticism 
Openness to experience 


Characteristics of Individuals 
with Low Scores 

Passive, quiet, introverted, reserved 

Cold, cynical, unpleasant, directly 
expresses aggression 
Unreliable, disorganized, irritable, 
prefers not having plans 
Calm, not neurotic, comfortable, 
deals with stress 
Conventional, practical, down-to- 
earth 


Characteristics of Individuals 
with High Scores 

Open, talkative, energetic, enjoys 
social situations 

Gentle/kind, cooperative, avoids 
conflict, credible 

Conscientious, responsible, 
organized, goal oriented 

Uneasy, worried, nervous, emotional 

Intellectual, cultural, open to new 
experiences 


and negatively phrased statements are often used for the different dimensions. For 
example, “I am often anxious” may be used instead of “I am never anxious.” 

Factor analysis of adjectives (e.g., “kind,” “friendly,” “firmand “anxious”) and 
longer statements has been the starting point for the five-factor model. It is also 
possible to break the five main factors into subfacets if a more detailed description 
of the person is required. Studies have shown that this five-factor solution may be 
replicated across language and cultural barriers, and a satisfactory correspondence 
between the subjects’ description of themselves and how others perceive them has 
been established, particularly when described by persons who know them well (see, 
for example, Digman 1990). 

However, not all researchers agree that the five-factor model represents a com¬ 
prehensive description of personality. Some think it contains too few (or too many) 
traits. Others find the model simplistic or that it fails to explain “how we have devel¬ 
oped into being who we are” (refer to Block, 1995, for a critical analysis). Despite 
these criticisms, the model has been widely used in research involving personality 
and appears to be accepted widely as a good starting point for personality assess¬ 
ments (see, for example, Digman 1990; Goldberg 1993). Studies have demonstrated 
a considerable inheritable component in personality characteristics and that per¬ 
sonality traits develop until the age of 30, after which they remain relatively stable 
(Terracciano, Costa, and McCrae 2006). 

In the 1980s a series of personality inventories were developed to map the big five 
traits. Costa and McCrae’s NEO personality inventory (NEO-PI) (1985) is particu¬ 
larly well known. The five traits are presumed to be relatively independent of abili¬ 
ties; however, one exception is “openness to experience,” which, to a certain extent, 
correlates with intelligence (Costa and McCrae 1985). These approaches share the 
advantage of a generally high reliability level. In terms of predictive validity, several 
of the dimensions have proven to be associated with work achievements, although the 
correlations are described as small or moderate (see Chapter 4). A meta-analysis of 
the relation between the personality traits and accidents revealed conscientiousness 
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and agreeableness to be associated systematically with accident involvement: People 
with low conscientiousness and agreeableness scores had experienced a greater num¬ 
ber of accidents than those with higher scores (Clarke and Robertson 2005). 

Although these measurement techniques were designed to measure aspects of an 
individual’s established personality, other methods or diagnostics systems are used 
to document problems, such as high levels of anxiety and depression, or to establish 
whether someone is suffering from mental illness. 

In addition to the five-factor model and its associated empirical systems, spe¬ 
cific traits are often used to describe personalities or aspects thereof that may be of 
importance in some situations. These traits are typically tied to certain theories or 
are particularly suited to explain reactions (or predict behavior) in certain situations. 
The following are examples of such traits: type A behavior ; locus of control (LOC), 
psychological resilience, and social intelligence. Type A behavior and LOC are dis¬ 
cussed in detail in Section 6.6 on individual differences and stress. 

Psychological resilience has been studied in particular in relation to people who 
thrive despite challenges and misfortune. Important factors here are personal attri¬ 
butes such as social skills and leading a structured life; however, external support 
from family and friends is also important (Friborg et al. 2005). Social intelligence 
usually refers to social skills and the ability to understand one’s own and others’ 
reactions (Silvera, Martinussen, and Dahl 2001). These traits are more or less related 
to the personality traits contained in the five-factor model. For example, LOC cor¬ 
relates with neuroticism, such that those with internal LOC are considered more 
emotionally stable. In addition to traits mentioned in scientific literature, a number 
of poorly documented theories on personality can be found in popular science maga¬ 
zines and the like, often accompanied by different tests that reportedly measure 
these theories’ accompanying traits. However, documentation to support the reli¬ 
ability and validity of such tests is usually scant. 

6.3 WHAT IS STRESS? 

We are continually bombarded with influences, expectations, and demands placed 
on us by our surroundings. Work commitments or the lack of time and resources to 
complete tasks are typical examples. Both paid work and unpaid work (e.g., caring 
for family members) are applicable factors in this regard. To meet social demands 
or solve work-related tasks, the individual relies on different sets of resources, 
including knowledge, experience, and personal attributes. Some theories describe 
stress as the result of factors or elements that have a negative impact on the indi¬ 
vidual—for example, distracting noise or pressure at work (stimulus-based theo¬ 
ries). Other theories are concerned with the consequences of stress, such as various 
emotional and physical reactions (response-based theories). The latter tradition is 
exemplified by Selye (1978). He describes a general stress response that is valid for 
everyone and consists of three phases: the alarm phase, the resistance phase, and 
the exhaustion phase. 

A more modern understanding requires stress to be regarded as the interaction 
between demands and the resources available to the individual. When demands 
placed on an individual exceed his or her resources, stress develops. In these 
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interaction models, an important point is that the person must evaluate the demands 
and consider whether these demands exceed his or her resources. Due to this cogni¬ 
tive evaluation, what one individual considers a stressor is not necessarily consid¬ 
ered a stressor by someone else (noted in, for example, Lazarus and Folkman 1984). 
Balance between external demands and personal attributes is perceived as challeng¬ 
ing and satisfying to the individual (Frankenhaeuser 1991), whereas imbalance is a 
precursor to emotional, physical, and behavioral consequences. 

Frankenhaeuser’s (1991) bio-psychosocial model (depicted in Figure 6.1) delin¬ 
eates the relationship between stress and health. In this model, the person is sub¬ 
jected to various demands, such as intense workloads, time constraints, shift work, 
problems, or conflicts. The person relates this to his or her resources, including 
experience, physical and mental health, personal abilities, and, potentially, exter¬ 
nal support. If demand surpasses the person’s resources, stress ensues, accom¬ 
panied by both psychological and physiological reactions. Immediately, various 
stress hormones are released into the body (adrenaline, noradrenaline, and cor¬ 
tisol). These hormones produce a number of advantageous effects in precarious 
situations; however, problems may arise if the individual is exposed to these effects 
for an extended period of time. If a person is continually stressed or if there is not 
enough time to rest, the body is unable to normalize the physiological reactions 
in time for the next work session. Stress is also an unpleasant experience, with 



FIGURE 6.1 The Frankenhaeuser (1991) bio-psychosocial stress model. (Reproduced with 
the kind permission of Springer Science and Business Media.) 
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short-term and long-term consequences for the affected person’s productivity. We 
will discuss this more in a later section. 

Other models describe work-related stress, such as Karasek’s demand-control 
model (Karasek 1979; Karasek and Theorell 1990), which describes how stress 
relates to various consequences such as health risks and behavior within the orga¬ 
nization. In this model, work-related demands are described as "high” or “low”; 
similarly, the individual’s ability to affect or control the situation is deemed “high” 
or “low.” Combining high demands with low levels of control increases the risk 
of psychological impacts and physical illness, such as cardiovascular diseases 
(Yoshimasu 2001). On the other hand, combining high demands and a high level 
of control encourages learning and has a motivational effect. Later expansions on 
this model have pointed out that social support, such as assistance and encourage¬ 
ment by colleagues, may reduce stress and minimize risks associated with negative 
consequences of stress. There are several forms of social support, such as care and 
empathy, as well as assistance of a more practical nature such as being applauded 
for doing a good job. 

6.4 CONFLICTS BETWEEN WORK AND PRIVATE LIFE 

Today, many of us choose to combine work and private life. This means that many 
people must be adept at several roles (parent, partner, employee, and so forth). The 
total workload is substantial and may lead to insufficient time for recreation and 
rest. Thus, conflicts may arise from the interaction between work and private life. 
At the same time, having multiple roles can have positive aspects, such as increased 
self-confidence and greater financial freedom. There are several approaches to work- 
to-home conflicts. One is that time management becomes difficult, and it seems like 
“there are not enough hours in the day.” Another is that work causes stress and 
exhaustion, leading to the inability to engage in quality family time as much as one 
would like. 

Several studies have described a connection between work-home conflicts and 
burnout (Martinussen and Richardsen 2006; Martinussen, Richardsen, and Burke 
2007) and between work-home conflicts and reduced satisfaction with one’s partner¬ 
ship, as well as reduced job satisfaction (Allen et al. 2000). Some studies have pointed 
to a so-called “crossover effect” between partners: Stress and tension experienced 
at work by one person are transferred to his or her partner, who subsequently has to 
deal with the stress by serving as a buffer (Westman and Etzion 1995). This transfer 
probably occurs because the person empathizes with his or her partner; however, a 
more direct effect is plausible because exhausted and frustrated persons have “less 
to give” when they come home from work. A study of couples with young children 
revealed that men were more likely to become passive and withdrawn upon returning 
home after a difficult day at work, and women were more likely to become aggres¬ 
sive (Schultz et al. 2004). In short, the study indicated that having a difficult day at 
work might have consequences for one’s partner and that there are gender-related 
differences in how one reacts to such situations. These findings have since been sup¬ 
ported by a survey of male flight controllers that revealed that they often reacted with 
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withdrawal after a stressful day at work. The study also revealed that satisfaction was 
greater when their partner accepted such behavior. 

Although fewer studies have investigated how family or private matters nega¬ 
tively affect job performance, we can safely assume that such effects would be 
undesirable. Some examples of demanding tasks at home include dealing with 
disease, partnership breakdowns, and caring for many young children. However, 
negative emotions are not the only emotions transferable between home and work. 
Positive experiences at work may transfer to family life as one arrives home con¬ 
tented and uplifted; conversely, positive events at home may lead to a better day at 
work. Arguably, people with family commitments may find it easier to set bound¬ 
aries for their work commitments, which, in the absence of a family, might have 
absorbed a greater part of the day. Thus, family commitments become a legitimate 
excuse to the employer and, not least, to oneself. In particular, young, single pro¬ 
fessionals presumably experience greater pressure to perform and are more likely 
to work longer hours, indicating that individuals with family commitments are not 
the only ones struggling to find the balance between work and leisure. Further, 
modern communications (such as e-mail and cell phones) may enable a person to 
continue working even after official work hours. 

Several studies have indicated the continuation of traditional labor-sharing prac¬ 
tices in households in which women account for cooking, cleaning, and caring for 
children and men are mainly responsible for tasks such as maintenance and car repairs 
(Lundberg and Frankenhaeuser 1999; Lundberg, Mardberg, and Frankenhaeuser 
1994; 0stlyngen et al. 2003). A Norwegian study involving parents with young chil¬ 
dren (0-6 years) demonstrated that females did about 70% of the domestic work; 
however, this study included a relatively high percentage of mothers who worked 
part-time (Kitterpd 2005). Nonetheless, the study indicates a larger total workload 
for women in comparison to men, leaving them with a reduced amount of time (after 
finishing the day’s paid and domestic work) available for relaxation and recreation. 

A study of junior managers employed at the car manufacturer Volvo in Sweden 
revealed that stress hormone levels were equal in female and male managers during 
work hours, but a difference was noticeable after work hours. Rising levels were 
recorded in females between 6:00 and 8:00 p.m., but in males the corresponding 
values decreased during the same period (Frankenhaeuser 1991). The physiological 
data were consistent with self-reporting of weariness. Thus, male junior managers 
started relaxing immediately after work. This was not the case for females until 
much later in the evening. Therefore, females had a shorter time available for relax¬ 
ation than did males, possibly incurring negative health consequences in the long 
term (Lundberg 2005). 

6.5 BURNOUT AND ENGAGEMENT 

Burnout can be regarded as a stress reaction occurring after long-term, work-related 
demands and pressure. Maslach and Jackson (1981, 1986) have defined burnout as 
a three-dimensional psychological syndrome consisting of emotional exhaustion (a 
condition including overwhelming emotional and physical exhaustion), depersonal¬ 
ization (characterized by negative emotions and cynical attitudes toward the recipients 
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of one’s service or care), and reduced personal accomplishment (a tendency to evalu¬ 
ate one’s own work negatively). Initial studies of burnout were based on workers 
in care professions such as nursing. Then, burnout was considered to be triggered 
by high interpersonal demands. More recently, however, burnout has been found in 
professions that do not necessarily involve caring for patients, clients, or pupils. The 
three dimensions have subsequently been generalized into the dimensions of exhaus¬ 
tion, cynicism, and professional efficacy (Richardsen and Martinussen 2006). 

A number of work environment factors have proven to be associated with burn¬ 
out. Leiter and Maslach (2005) have described six categories of such factors. One 
of these is workload: too much to do (or not enough time available to do it) or insuf¬ 
ficient resources to solve given tasks. Insufficient control or autonomy in the work¬ 
place is also associated with burnout. If a person feels powerless to control resources 
required to complete tasks or unable to influence how the job is done, reduced per¬ 
sonal accomplishment may result. 

Some work-related factors have a preventive effect on burnout or serve as a "buf¬ 
fer.” These factors include social support or assistance provided by management or 
colleagues. Being rewarded, acknowledged, and feeling fairly treated (in terms of pro¬ 
motions, etc.) are positive resources. Sometimes an employee may find that the orga¬ 
nization has values different from his or hers—for example, being told to withhold 
information or deceive someone. At present, however, what happens when values held 
by employees differ from those of employers has been insufficiently researched. 

Although early research into burnout was directed at particular jobs (such as the 
previously mentioned nurses), not only those in demanding care professions are at 
risk. Few studies have been aimed at burnout in aviation professions, and most work- 
related stress studies focus on short-term effects, such as measuring blood pressure 
changes in flight controllers when exposed to elevated air traffic density levels. 

A study of Norwegian air traffic controllers concluded that the burnout rate was 
not significantly higher for them than for other professions included in the study. 
Both the levels of conflict experienced and work-home-related conflicts were associ¬ 
ated with exhaustion in this group (Martinussen and Richardsen 2006). Most people 
would consider flight control to be a highly stressful profession; thus, it is surprising 
that this group did not have elevated burnout rates. On the other hand, flight control¬ 
lers go through a process of strict selection, education, and training to enable them to 
perform extremely demanding tasks. Hence, there appears to be a balance between 
the tasks to be solved and the skills and abilities of the employees. This does not 
mean that the individuals are immune to the demands and requirements of the orga¬ 
nization or that access to resources would not have a positive effect. Similar results 
have been found for the police profession, which considers organizational issues to 
be more frustrating and demanding than the job itself (Martinussen et al. 2007). 

Traditionally, studies into work environments and burnout have focused greatly 
on negative aspects and attempted to establish the illness-inducing sides of the work 
environment (Maslach, Schaufeli, and Leiter 2001). Recently, however, researchers 
have turned their eyes on studying the opposite of burnout—that is, engagement— 
to find out what causes this outcome. Schaufeli and Bakker (2004) present a model 
that describes how resources and workplace demands relate to both engagement 
and burnout (Figure 6.2). This model tells us that burnout is, first and foremost. 
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FIGURE 6.2 The job demands-resources model for burnout and engagement. 

associated with demands and, second, with a lack of resources; engagement is pre¬ 
dominantly associated with access to resources such as rewards, recognition, and 
support. Burnout and engagement have consequences for the organization. Burnout 
implies negative consequences, whereas engagement has a positive impact. Examples 
of organizational consequences are intention to quit, work satisfaction, work perfor¬ 
mance, and feeling commitment to the organization. Burnout also has negative con¬ 
sequences for the affected individual’s health and quality of life. 

6.6 INDIVIDUAL DIFFERENCES AND STRESS 

There is little doubt that some working environments or factors are generally consid¬ 
ered stressful. Yet, people with certain personality characteristics experience stress 
more often and more intensely than others. It is therefore of interest to study these 
differences in detail. Studies have shown that neuroticism is associated with burnout. 
Although some of the other big-five personality characteristics also have been found 
to be associated with burnout, findings vary greatly between studies. 

Working with individuals suffering from cardiovascular disease, two physicians 
(Rosenman and Friedman 1974) claimed to have observed certain recurring traits in 
their patients. They labeled some of their patients’ behavior type A —characterized 
by irritability, time constraints, competitive mentality, aggression, hostility, and 
ambition. Patients who did not display these properties were labeled type B. Several 
methods of measuring and mapping type A personalities have since been developed, 
of which the most well known and widely used is the Jenkins activity survey. This 
method condenses the mentioned characteristics into two dimensions: impatience- 
irritability and achievement strivings. The latter represents a positive side—namely, 
that the person sets goals and works hard to achieve them. The former dimension 
represents the less positive side and is characterized by impatience and aggression. 
Impatience-irritability is related to health issues, while achievement strivings are 
associated with enhanced work achievements and better student grades. 

Researchers have established the existence of a significant inheritable component 
in type A behavior; type A has also been found to be a risk factor in the development 
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of cardiovascular diseases (see, for example, Yoshimasu 2001). The link is thought 
to occur because type A—in particular due to the impatience-irritability aspect—is 
associated with greater physiological activation including elevated blood pressure 
levels and heart rate. However, lifestyle choices, such as alcohol and smoking, may 
also contribute to this pattern. 

What are the consequences of type A behavior in the workforce? In fact, many 
organizations are likely to reward type A personalities, at least those aspects relat¬ 
ing to striving for achievement; however, irritability may cause problems for the 
individual and his or her surroundings. Perhaps type A behavior directly influences 
work-related satisfaction and the experience of stress and burnout. The combina¬ 
tion of aspects of the work environment and, for example, impatience-irritability 
may cause particularly unfortunate outcomes for certain individuals. Type A person¬ 
alities may even intuitively choose professions that are more challenging or involve 
a greater pace and workload. It may be the case that type A personalities affect 
their work environment in certain ways and therefore contribute to creating a more 
stressed environment. 

Another personal attribute under investigation is locus of control—the extent to 
which the person feels that he or she can influence or control events and situations. 
Internal LOC has been shown to be associated with several work-related variables, 
including improved motivation and commitment (Ng, Sorensen, and Eby 2006). 
People with active coping techniques (i.e., who act strategically to handle difficult 
situations) generally score lower on burnout than people who use more emotion- 
focused coping techniques (e.g., the person attempts to deal with emotions by seek¬ 
ing comfort in other people). 

People also differ according to gender, age, and other circumstantial factors. 
Generally, low correlations exist between burnout and factors such as age and gender. 
Sometimes, younger professionals have been found to be more exposed to burnout, 
but in other studies (e.g., flight controllers and the police), the “age effect” is reversed 
(Martinussen and Richardsen 2006; Martinussen et al. 2007). However, such cor¬ 
relations are found to be weak. This also applies to differences due to gender: Some 
studies find that females report a greater degree of exhaustion than men and that men 
have higher cynicism scores; others find no gender differences. 

6.7 CONSEQUENCES OF STRESS 

Stress has both short-term and long-term consequences. The following discusses 
emergency situations and how this affects an individual in relation to job perfor¬ 
mance. A critical stress situation may arise when something unusual takes place dur¬ 
ing a flight. Examples of stress situations include indications of technical difficulties 
or a rapid deterioration of weather conditions. It is important to know typical reac¬ 
tions in such situations, be aware of how the crew responds to stress, and understand 
how it affects decisions made during the flight. 

It is difficult to measure how stress affects various cognitive functions. Ethical 
considerations are involved in exposing subjects to stressful situations in experiments 
to study how they react. A possible solution to this problem is to use a simulator to 
study how people handle various abnormal situations. This provides a controlled 
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environment in which emergency situations can be manufactured, workloads and 
time constraints increased, and reactions recorded. Although many simulators are 
highly realistic, they will never be identical to real-world experiences; thus, the 
possibility that findings cannot be transferred to real-world situations must not be 
ignored. 

A second option is to study incidents that have already taken place, reconstruct 
the decisions made, and observe how stress contributed to the event. The drawback 
of this method is that it is based on human recollection and perception of the event 
and what happened. Another issue is that situations may appear quite differently 
in hindsight compared to how the situation actually occurred and was experienced 
by the people involved. No matter which design is chosen, there are challenges and 
shortcomings; however, some findings on the immediate effect of stress on cognitive 
functions and decision-making capacity are consistent. 

According to a review provided by Orasanu (1997), stress may have the follow¬ 
ing effects: 

• People make more errors. 

• Attention is reduced, causing tunnel vision or selective hearing. 

• Scanning (vision) becomes more chaotic. 

• Short-term memory is reduced. 

• Change of strategy: speed gains preference to accuracy. People act as though 
time limits apply. Strategies are simplified. 

Thus, cognitive functions are subject to a number of stress-related consequences 
in terms of how we perceive our surroundings, process information, and make deci¬ 
sions. An important aspect regarding aviation is the need to take in and monitor 
information constantly. In a high-stress situation, the capacity to do so diminishes, 
reducing the ability to understand what is being said over the radio or what another 
person says. Similarly, the information that is supplied to someone stressed may be 
poorly understood or not understood at all. 

By short-term memory, we refer to the processes or structures that contribute to 
the temporary storage and processing of information. It enables us to read a couple of 
sentences while storing and processing information about, say, the last word in each 
sentence and to repeat those words in the correct order. Some argue that there are 
clear limitations to short-term memory, and early studies revealed a short-term mem¬ 
ory capacity in adults of seven numbers (±2). However, later studies have pointed 
out more of a complexity in this matter. For instance, grouping a set of numbers 
and remembering them as one (e.g., “2,” “4,” and “5” as “245”) makes it possible to 
recall even more numbers. On the other hand, more complex elements (e.g., words) 
are more difficult to remember, reducing the number of retainable elements to less 
than seven. In stressful situations, this capacity will be further reduced, with con¬ 
sequences for the ability to perform basic mental arithmetic, such as calculating the 
remaining flight time when the fuel tank is half full. 

In connection with incidents and accidents, much focus has been put on decision 
making: How was the situation perceived, and what was the chosen course of action? 
Klein (1995) has studied decision making in real-life situations, in aviation and in 
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general, for many years, and he developed a model called recognition-primed deci¬ 
sion (RPD). His model involves experts using their knowledge to recognize or iden¬ 
tify the problem and choosing a solution that has been proven successful in previous, 
similar situations. If the solution to the current situation is suitable, it is applied, and 
only one solution is considered at a time. 

This model has since been expanded by Judith Orasanu (1997), who describes a 
model that can be used even in unfamiliar situations. The first thing the person does 
is to evaluate the situation: What is the problem? Are the warning signs clear and 
unambiguous or are they changing? Often, experts in such situations also consider 
how much time is available to take the necessary steps and they evaluate the level 
of risk involved. Then, the person must evaluate whether any existing procedure is 
available to remedy the situation. Perhaps there is more than one solution? One pos¬ 
sibility is, of course, that there are no known solutions, which triggers the need to 
create a new and untested course of action. 

In general, cognitive processes that involve retrieval of information from long¬ 
term memory are resilient to stress, while processes that require the use of short-term 
memory are more vulnerable. In other words, if the warning signs are well known 
and clear and a standard solution is applicable, the situation will not be significantly 
prone to stress. On the other hand, if the signs are obscure or keep changing and mul¬ 
tiple courses of action must be considered, the situation is prone to stress. Perhaps it 
is not surprising that experienced pilots do not make as many mistakes under pres¬ 
sure as less experienced pilots: They have a greater number of experiences stored in 
their long-term memory and are more likely to use a rule-based approach rather than 
having to consider several options or even improvise new solutions. 

It is therefore important to have knowledge of stress and how we are affected by 
acute stress; for example, a common mistake is to assume that time is more precious 
than it is. Another consequence of stress is the simplification of strategies, such as 
preferring speed over accuracy. It is important to be trained in managing stressful 
situations to familiarize oneself with critical situations and learn how they can be 
resolved. It is also important to be aware of strategies to reduce workloads in stress¬ 
ful situations—for example, how to distribute tasks between members of the crew in 
the best possible way. 

6.8 SHIFT WORK 

Shift work is commonplace for many people in aviation, as are night shifts for some 
workers. Crew members who travel across time zones also experience problems due 
to jet lag. The notion of shift work usually applies to work taking place outside regu¬ 
lar daytime hours (6 a.m.-6 p.m.); however, there are different types of shift work 
arrangements, many of which apply some form of rotating shifts. Shift work has 
several consequences in relation to health issues, sleep, work achievements, the risk 
of accidents, increased work-home conflicts, and participation in social activities 
taking place on weekday evenings and weekends. Working shifts may also influence 
one’s relationship with the employer—for example, in the form of reduced work sat¬ 
isfaction (Demerouti et al. 2004). 
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6.8.1 Sleep 

Sleep difficulties are one of the most common problems associated with shift work 
in general and night shifts in particular. In human beings, body temperature and the 
production of hormones, stomach acids, and urine follow a cycle of approximately 
24 hours. External factors such as light exposure influence the internal clock (or 
“body clock”) and its adjustment. Hence, at night, the body is expecting to do some¬ 
thing completely different from working. Most adults usually sleep between 6 and 
9 hours each night, averaging from 7 to 7.5 hours (Ursin 1996). Physiological mea¬ 
surements make it possible to map the various phases of sleep, including the rapid 
eye movement (REM) phase, in which dreams occur. Throughout the night, the dif¬ 
ferent phases are repeated. Toward the end of the night, deep sleep subsides and it is 
common to wake up a couple of times. Sleep varies according to body temperature, 
which is at its lowest point in the morning (between 4 and 6 a.m.) and this is when it 
is most difficult to stay awake (Pallesen 2006). 

Several theories have been proposed on the function of sleep—that is, why we sleep. 
Some are based on the theory of evolution and postulate that it is safer to be inactive 
in darkness because we cannot see where we are going. A second group of theories 
argues that sleep has an important restorative function for the body and that certain 
types of hormones (which boost the growth of body tissues) are produced during sleep 
(Pallesen 2006). In addition, it appears that sleep has a restorative function relating to 
brain cells and their protection from cell degeneration processes (Pallesen 2006). 

Sleep is normally regulated by how long we have been awake as well as our daily 
routines and habits (Waage, Pallesen, and Bjorvatn 2006). Thus, it is more difficult 
to sleep during the day than the night; people who go to bed after a night shift com¬ 
monly experience less total sleep and more frequent sleep interruptions, and they 
may have to get up to go to the toilet even though this normally is not necessary when 
sleeping through the night. There are, however, individual differences in tolerance 
to night and shift work. Some studies have shown that mature individuals (over the 
age of 45) have greater problems sleeping or resting after working a night shift and 
that problems increase with age (Costa 2003). Some people, however, find it easier to 
work night shifts as they get older. Increased experience in this type of work and the 
acquisition of adaptive techniques are possible interpretations of the results in these 
cases. Perhaps the worker’s children have grown up, making the family situation 
more accepting of sleeping during the day. 

Typically, however, studies into the consequences of shift and night work are 
likely to be influenced by a “selection effect.” It is reasonable to assume that those 
who experience great discomfort with such work hours would quit the job after a 
period of time, resulting in a selection process in which those who experience less 
discomfort continue working. Estimates show that as many as 20% of shift workers 
quit after a relatively short period of time (Costa 1996). 

6.8.2 Health Implications 

Several studies have found adverse health implications in shift workers, including 
increased exposure to cardiovascular diseases, problems with digestion, and cancer 
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(Costa 1996). A longitudinal Danish study that monitored a vast number of subjects 
over 12 years demonstrated that shift workers were more likely to develop cardio¬ 
vascular disease compared to daytime workers (Tiichsen, Hannerz, and Burr 2007). 
Females working shifts more frequently report problems linked to menstruation; 
some studies have found a connection between shift work and miscarriages, low 
birth weight, and premature birth (Knutsson 2003). A meta-analysis of 13 studies on 
the link between breast cancer in women and night shift work suggested an increased 
breast cancer risk. Approximately half of these studies were based on cabin crew, 
while the remaining studies were based on females in other types of night shift work. 
The reason for the elevated cancer risk is uncertain, but a possible explanation is that 
working nights reduces the production of melatonin, which is considered to have a 
cancer-preventing effect (Megdal et al. 2005). 

Presumably, the negative health effects are direct consequences of disruptions to 
the biological 24-hour rhythm, as well as behavioral changes due to shift work. For 
example, sleep deprivation can lead to the use of alcohol to induce sleep or exces¬ 
sive smoking to stay awake at night. It is also possible that shift work contributes to 
work-home conflicts, which heighten stress levels and further exacerbate the adverse 
health implications of shift work. 

6.8.3 Accident Risk 

A number of studies have examined how sleep deprivation affects performance and 
how it relates to accidents. Laboratory studies have looked at the connection between 
reduced sleep and cognitive and psychomotor tasks, revealing that tasks requiring 
constant attention are more affected by sleep deprivation; more advanced tasks, such 
as reasoning, were less affected (a summary can be found in Akerstedt 2007). One 
study, in which subjects were asked to operate a flight simulator at night, demon¬ 
strated that their reduction in performance was equivalent to a blood alcohol level of 
0.05% (Klein, Bruner, and Holtman 1970). 

The consequences of sleep deprivation generally increase during an extended 
period without sleep. However, a person’s day-to-day routines are also important; 
performance improves during the day relative to the night (Akerstedt 2007). Even 
after a prolonged period of being awake, performance will improve somewhat dur¬ 
ing the time period in which an individual is normally awake (the “daytime” accord¬ 
ing to the person’s body clock). 

Working shifts and working night shifts in particular are associated with ele¬ 
vated risk of accidents (a summary is provided in Folkard and Tucker 2003). Several 
studies from the field of medicine show that doctors make more mistakes if they 
are sleep deprived (e.g., during 24-hour shifts) and need longer time to perform 
basic tasks, such as intubating a patient (Akerstedt 2007). A number of studies 
into motor vehicle accidents point to drowsiness—particularly driving after work¬ 
ing night shifts—as an important factor in many accidents. In a survey of pilots 
who were asked to describe how drowsiness typically affected them, the most 
common symptoms were reduced attention and lack of concentration (Bourgeois- 
Bougrine et al. 2003). The remainder of the crew reported evidence of tiredness. 
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primarily in the form of longer response times, greater error frequency, and poor 
communication (Bourgeois-Bougrine et al. 2003). 

6.8.4 Private Life 

There are individual differences in the ability to deal with shift work. These dif¬ 
ferences are usually attributed to biological, social, and health issues, although the 
organization of the work and the shift schedule do play a part. Employees who are 
privileged enough to have a certain level of flexibility and influence on the shift work 
roster are generally more content and experience fewer problems associated with 
shift work (Costa, Sartori, and Akerstedt 2006). 

For most people, however, shift work is problematic and has significant conse¬ 
quences for private life. There are issues with meeting family commitments and 
participating in social activities, which typically take place in the afternoon. Having 
a day off on a weekday is not the same as having a day off on the weekend. Family 
commitments may also make it more difficult to sleep during the day to recuperate 
after a night shift or having been awake for a long period of time. Mood volatility 
after night shifts undeniably places additional demands on the individual and the 
family. A study of police officers in The Netherlands concluded that the timing of 
shift work is a decisive factor as to the level of work-home conflicts and recom¬ 
mended avoiding shifts that involve regular weekend work (Demerouti et al. 2004). 

Some studies have described problems with combining shift work and family life 
as higher for females than males. In particular, women with young children report 
shorter and more interrupted sleep after night shifts, as well as accumulated tired¬ 
ness (Costa 1996). 

6.8.5 Jet Lag 

During eastbound and westbound long-haul flights, several time zones may be 
crossed. Longer trips generate greater divergence from the biological clock, as well 
as longer work hours for pilots and crew. Common symptoms of jet lag include prob¬ 
lems sleeping at designated times, daytime weariness, concentration and motiva¬ 
tional issues, reduced cognitive and physical abilities, headaches, and irritability. 
Reduced appetite and digestion problems may also occur. 

It takes time to adapt to a new time zone. As a rule of thumb, resetting the bio¬ 
logical clock takes about 1 day per time zone crossed or entered. Thus, on a flight 
from Oslo to New York (a 6-hour time difference), it would take 6 days to adjust. 
However, crew members are often stationed for a shorter period than is necessary 
to come to terms with the new routine; the stay is interrupted by the return flight or 
perhaps another flight to a new time zone. Jet lag is slightly less severe on westbound 
flights compared to eastbound flights. This is because the body finds it easier to adapt 
to a slightly longer day (when traveling west) than a slightly shorter one (traveling 
east) because the biological clock is slightly longer than 24 hours for most people. It 
is easier to stay up late and postpone sleep than forcing oneself to sleep earlier than 
the normal time. 
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Exposure to light in certain periods may ease the impact of a new environment. 
Using the drug melatonin to fight the effects of jet lag is controversial for air person¬ 
nel and, in fact, discouraged. First, evidence for its effectiveness is insufficient and, 
second, negative side effects such as reduced attention span during work hours have 
not been ruled out (Nicholson 2006). 

6.8.6 How Can One Prevent Drowsiness? 

When a person is required to stay awake for long periods or during night shifts, per¬ 
formance deteriorates and the probability of errors increases. Particularly at times 
when there is not much to do (e.g., on long-haul flights) staying awake may become 
a problem. There are various countermeasures to sleepiness, such as having a power 
nap (short nap) or engaging in some physical activity. This may not always be pos¬ 
sible due to the nature of the job. Some substances affect both wakefulness and 
performance; caffeine is a well-known example. If possible, performing a different 
type of task for a while can help. Some people find it worthwhile to take a nap before 
a night shift, but others find it difficult to sleep in the afternoon, perhaps because 
of family commitments such as taking children to soccer practice. Generally, it is 
important to sleep well between shifts (good sleep hygiene). 

6.9 EXTREME STRESS 

Persons affected by accidents in aviation, both directly and indirectly (e.g., close 
colleagues perishing in a plane crash), are exposed to stress that is different from 
everyday stress. Extreme stress reactions vary from a strong feeling of surreality 
immediately after the event to an apparent absence of a reaction. However, the long¬ 
term effects must also be considered; although an individual may seem to handle the 
situation well initially, it may take time for a reaction to manifest itself. 

Persons exposed to trauma are at risk of developing posttraumatic stress disorder 
(PTSD), characterized by discomforting thoughts or dreams in which the trauma is 
relived. Affected individuals feel numb and avoid situations that remind them of the acci¬ 
dent. Restlessness, nervousness, and sleep problems are also common. Posttraumatic 
stress disorder can lead to a reduction in the affected individual’s ability to perform 
duties to the same level as before the accident. In some cases, the condition is charac¬ 
terized by fear, helplessness, aggression, and/or a hostile attitude. Why the condition 
is characterized by negative emotions has not yet been made clear; however, a possible 
explanation is that people who have been exposed to a traumatic event have a reduced 
threshold for perceiving a situation as threatening. This leads to anxiety and avoidance, 
but also to aggression and more easily finding oneself in an attack position. 

The link between the severity of PTSD symptoms and aggression is stronger in 
individuals traumatized by acts of war than in individuals experiencing other trau¬ 
matic events (Orth and Wieland 2006). In most cases, these symptoms will sub¬ 
side in the weeks and months following the experience, although a few individuals 
develop a chronic condition associated with depression, substance abuse, anxiety, 
and inability to work. Even though most people regain their ability to work after a 
traumatic incident, it is important to be able to identify individuals who are at risk 
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of developing PTSD. It is safe to assume that social interventions, such as care and 
support from colleagues and management, will be of use to many of those affected. 
Population studies from the United States have shown that 50-60% of the popula¬ 
tion has experienced a traumatic event at least once in their lifetime, but only a small 
proportion (5-10%) developed PTSD (Ozer et al. 2003). Therefore, it is of interest to 
explore the factors or conditions that help us understand why some people develop 
PTSD and others do not. 

A meta-analysis of a number of studies examining these factors found that indi¬ 
vidual factors, such as a record of previous mental health problems and prior expo¬ 
sure to trauma, were associated with more severe PTSD symptoms. Other factors, 
such as previous involvement in life-threatening situations, social support networks, 
and which emotions were experienced during the event itself were also associated 
with PTSD symptoms (Ozer et al. 2003). 

With regard to vulnerability factors, those who were directly involved in the acci¬ 
dent are considered more vulnerable than those not directly exposed to the event. 
Persons not appropriately trained—for example, in relation to the necessary rescue 
efforts—are more likely to have stronger and longer lasting reactions. In addition, 
the severity of the situation (e.g., as measured by the number of dead and injured or 
intense and overwhelming sensory impressions) is likely to increase the probability 
of developing PTSD. 

All organizations should have routines for how to deal with accidents and how to 
care for those involved after the accident. The most common approach is to review 
the event with those involved within 48-72 hours after the event. This requires 
getting together in groups under the leadership of someone trained in such exer¬ 
cises. Typically, this entails restating exactly what has happened as well as sharing 
thoughts and emotions. Those involved are informed about common reactions to 
accidents or dramatic events. These measures are intended to minimize acute symp¬ 
toms and make those involved more capable of dealing with their reactions in the 
time to come. 

Another benefit of conducting such group interventions is the identification of 
people in need of extra counseling and care. For those who need more support, there 
are a number of different types of individual therapy, such as cognitive-behavioral 
therapy, which contains an exposure part and provides a structured framework to 
deal with thoughts and emotions. Other cognitive-behavioral methods emphasize 
assisting the person in dealing with anxiety. A different form of therapy consists of 
what is called “eye movement desensitization,” which consists of having the patient 
visualizing the traumatic event while watching a moving stimulus. 

Meta-analyses have found that therapy represents an efficient treatment of PTSD 
and that up to 67% of those who complete the treatment no longer satisfy the criteria 
to be rediagnosed with PTSD (Bradley et al. 2005). It is important to encourage 
affected individuals to keep working and maintain frequent exposure to the situa¬ 
tion—for example, by returning to aviation. Studies have revealed that those who 
return to work are better off in the long term than those who choose to change pro¬ 
fessions, even for individuals who initially had similar symptoms. 
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6.10 PASSENGER REACTIONS 

Most of the intended readers of this book love to fly, but others consider air travel 
best avoided. Some people become argumentative or quarrelsome, risking a fine or 
imprisonment for their behavior. The final part of this chapter will therefore deal 
with distressed and disorderly passengers. 

6.10.1 Fear of Flying 

Extreme fear of flying, or flight phobia, implies an exaggerated fear of flying com¬ 
pared to the real risk involved. Affected individuals travel by air in great discomfort 
or avoid it altogether. Some have to travel by air as a work requirement, in which case 
fear of flying can be particularly inconvenient. Fear of flying can also be a significant 
impediment to holiday arrangements. In a survey of a group of randomly selected 
Norwegian nationals, about one in two said he or she was never afraid of flying, 
while the remainder indicating varying degrees of discomfort (Ekeberg, Seeberg, 
and Ellertsen 1989). International studies indicate that between 10 and 40% of pas¬ 
sengers are affected by fear of flying (the significant variation in this figure is prob¬ 
ably due to phrasing of questions and the composition of the group surveyed). A 
Norwegian survey (Martinussen, Gundersen, and Pedersen 2008) asked participants: 
“To which degree are you afraid of flying?” The results are outlined in Table 6.2. 

The group surveyed consisted of students as well as passengers surveyed at an 
airport in northern Norway. The survey found no correlation between age and fear 
of flying, but it did find that females were more likely to report fear of flying than 
males. Participants disclosing any form of discomfort during flight were asked 
about which factors caused the greatest amount of fear; commonly, these were cabin 
movement, vibrations, noises, or announcements of turbulence. Participants were 
also asked how they reacted to such situations. Typical reactions were experienc¬ 
ing palpitations and believing something was going to go wrong. In addition, many 


TABLE 6.2 

Extent to Which the Survey's 
Participants Are Afraid of Flying 



Total 


(N = 268) 

Not afraid at all 

56% 

Sometimes a little afraid 

29% 

Always a little afraid 

8% 

Sometimes very afraid 

4% 

Always very afraid 3 

4% 


3 Combined category based on three alternatives, 
all involving very afraid. 
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reported great attentiveness to aircraft noises and closely watching the behavior of 
cabin crew. 

Even aviation personnel can develop a flight phobia. Although the prevalence of 
this problem is unknown, one study examined a group of aviation personnel who 
were seeking support for various psychological problems. Fourteen percent reported 
fear of flying, a quarter of whom said they had experienced an accident or knew 
someone who had been involved in one. In addition to fear of flying, about half of the 
individuals in the group had been diagnosed with a mental illness such as depression 
(Medialdea and Tejada 2005). This would apply accordingly to passengers suffering 
from flight phobia; that is, secondary issues such as claustrophobia (fear of confined 
spaces) or other psychiatric disorders may be involved as well. 

6.10.2 Symptoms 

Symptoms of fear of flying are the same as those for other forms of anxiety and may 
include palpitations, dizziness, chest pressure, paleness, cold sweat, and a strong 
need to use the bathroom. Physical symptoms are often accompanied by feelings of 
surreality, feeling faint, or thinking that one is about to go insane. Characteristics of 
phobias normally include high levels of anxiety in everyday situations, an absence of 
rational explanations to the response, a loss of control over the response, and insuf¬ 
ficient coping strategies to deal with the response. 

A number of methods have been created to measure fear of flying. Examples are 
the flight anxiety modality (FAM) questionnaire and the flight anxiety situations 
(FAS) questionnaire (Van Gerwen al. 1999). In both questionnaires, the responders 
are asked to rate each item on a five-point Likert scale. Some questions (from FAM) 
relate to physical symptoms (1-3) and the second category is concerned with thought 
processes (4-6), as follows: 

1. I am short of breath. 

2. I feel dizzy, or I have the feeling that I’m going to faint. 

3. I have the feeling that I’m going to choke. 

4. I think the particular plane I am on will crash. 

5. I attend to every sound or movement of the plane and wonder if everything is fine. 

6. I continuously pay attention to the faces and behavior of the cabin crew. 

Such measuring instruments can be used to establish the extent to which someone 
is afraid of flying and may also be used in research to study the effectiveness of treat¬ 
ments. It is also possible to employ physiological measurements (e.g., of the subject’s 
heart rate) to complement the self-report. 

6.10.3 What Is There to Be Afraid of? 

One may ask what someone suffering from flight phobia worries about. Normally, 
affected individuals are concerned about conditions outside their control, such as 
poor weather conditions, technical failures, and human error (in pilots or flight con¬ 
trollers). In other cases, the source of worry is self-consciousness about one’s own 
reactions and unusual behavior, such as fainting in the aircraft. Some find the whole 
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experience leading up to a flight to be worse than the flight itself, including issues 
relating to check-in, queues, delays, and time-consuming security checks. On the 
other hand, air travel may enable positive experiences, although this aspect has been 
studied to a lesser extent than the negative implications of air travel. 

6.10.4 Treatment 

The chosen course of treatment depends on whether fear of flying is the only prob¬ 
lem or if the person has other phobias, depression, or other mental illness. Studies 
show that people suffering from personality disorders benefit from standard treat¬ 
ments of flight anxiety (Van Gerwen et al. 2003). Thus, those suffering from mental 
illnesses should not avoid treatment for aviophobia; however, these individuals are 
unlikely to receive the full treatment required for separate disorders. A summary of 
treatment types revealed that most involved some form of mapping the individual’s 
problem (Van Gerwen et al. 2004). Typically, treatment is of relatively short dura¬ 
tion (1-3 days) and takes place in a group setting; the content usually consists of 
relaxation exercises, as well as some form of cognitive-behavioral therapy. In most 
cases, therapy concludes with exposure to flying in flight simulators or real-world 
aircraft. Recently, a number of studies have emerged that use personal computers 
as a medium for exposure to flying. These studies involve simulating a flight using 
visual, auditory, and physical stimuli (vibrations) combined with relaxation tech¬ 
niques, provisioning a framework to structuring emotions and stopping undesired 
trains of thought. 

When asked which methods they use to help them relax, about 37% of passengers 
respond that they sometimes or often use alcohol to reduce anxiety related to flying. 
A total of 47% seek distractions and just below 10% use some form of medication. 
Booklets and information have been developed to advise passengers suffering from 
fear of flying how to cope with and prepare for a flight. These points are based on 
advice produced at a conference (Airborne, Vienna, December 2000) at which a 
number of flight phobia experts convened to discuss the topic (quoted in Van Gerwen 
et al. 2004). A number of organizations also promote the study and treatment of 
flight anxiety (such as The Valk Foundation, accessible at http://www.valk.org). 

Advice to sufferers of fear of flying from Van Gerwen et al. (2004, p. 33) includes: 

• Avoid caffeine, sugar, nicotine, and self-medication. 

• Practice relaxation. 

• Drink plenty of water and avoid alcohol. Alcohol does not decrease but rather 
increases fear and contributes to dehydration. 

• Pay attention to your breathing and regularly carry out your breathing exercises. 

• Turbulence is uncomfortable, but safe when your seatbelt is fastened. 

• Stop the “what ifs” and focus on "what is.” 

• Keep flying. Do not avoid it. 

• Motivation is the key to change. 

• Planes are designed and built to fly. 

• Write on cards reminders of personal coping instructions that work for you. 
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6.11 THE PAINS AND PLEASURES OF AIR TRAVEL 

After the September 11, 2001, attacks on the United States, many people avoided air 
travel for a while, causing passenger figures to plummet. The number of travelers 
had more or less rebounded by 2003; however, the number of Americans traveling 
abroad was still in decline (Swanson and McIntosh 2006). In a British survey, a large 
majority of respondents (85%) said the September 11 events did not influence their 
future travel plans (Gauld et al. 2003). Such terrorist attacks are probably regarded 
as isolated events, causing the majority of the population to think they will not hap¬ 
pen to them. 

For some, however, this will be a source of concern, especially when com¬ 
bined with media attention on other health risks such as deep vein thrombosis and 
cardiovascular and infectious diseases, such as the recent emergence of severe 
acute respiratory syndrome (SARS). The flight-related increase in the occurrence 
of blood clots, for example, is extremely low except for individuals belonging to 
a particularly susceptible group (Bendz 2002; Owe 1998); however, there is often 
a difference between the real, objective risk that something bad will happen and 
the subjectively observed risk. Significant differences also exist between people in 
terms of risk aversion. Typically, people are willing to take greater risks when in 
a position of control or influence, which is rarely the case in air travel unless the 
person is piloting the aircraft. 

In addition, everyday events or factors may act as stressors—for example, the 
purpose of the trip itself, such as doing something one does not look forward to 
or leaving one’s family for a long period of time. Other aspects include the trip to 
the airport, check-in, and security controls, of which the conditions are constantly 
changing and becoming more stringent. It may be difficult to estimate the amount 
of time the boarding process will take and to decide whether delays will ensue. 
Sometimes many different things will go wrong at the same time, causing the accu¬ 
mulation of various stressors. People also differ in relation to personality and coping 
techniques and will therefore react in different ways to identical events, much in 
the same way that people react differently to work-related stress. In a British study, 
travelers were asked which factors led to anxiety, and the most common responses 
were delays (mentioned by 50% of respondents) and boarding the plane (mentioned 
by 42% of respondents) (McIntosh et al. 1998). 

In a more recent study, Norwegian passengers were asked about the positives 
and negatives of air travel (Martinussen et al. 2008). A total of 268 individuals, 
consisting in part of travelers and in part of individuals recruited at the University 
of Tromsp, participated in the survey. Participants were asked how they usually 
experienced check-in and security controls. Of the respondents, 24% said they 
experienced check-in as either moderately or very stressful. In total, 11% of respon¬ 
dents reported dissatisfaction with the way check-in was conducted. With regard 
to security controls, 24% were dissatisfied with the way that they were conducted 
and a total of 40% reported experiencing this aspect of air travel as moderately or 
very stressful. Those who felt unable to influence events and felt that information 
was lacking at the airport were likely to be more stressed during check-in and 
when going though security; these respondents also experienced lesser degrees 
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of pleasure in relation to air travel, as well as greater degrees of anxiousness 
(Martinussen et al. 2008). 

Participants were asked about the positive sides of air travel; 72% said they looked 
forward to it, and 56% reported excitement about being in an airport. In other words, 
most people see positive sides of air travel, although there are negative aspects as well. 

Swanson and McIntosh (2006) have launched a model for stress related to air 
travel. The model describes various factors that may be considered demands or stres¬ 
sors to which passengers are exposed. These factors involve check-in, security con¬ 
trols, the flight itself, and conditions in the cabin (e.g., poor air quality and limited 
space). The moderating variables are described as personality traits, coping mech¬ 
anisms, and demographic variables such as gender. These variables may enhance 
or reduce the effects of stressors. The model also describes possible aftereffects of 
flight-related stress, such as physiological reactions, health complaints, aviophobia, 
and even anger and frustration. This model is similar to general stress models with 
the notable exception that it is not yet supported by much research, although the 
model is highly persuasive. The model can be extended to include resources that 
are thought to alleviate stress—for example, access to information and influence (or 
control) over the situation. 

6.12 UNRULY PASSENGER BEHAVIOR 

At 10,000 feet, a female passenger attempted to open the rear exit door of an air¬ 
craft traveling from Zurich, Switzerland, to Copenhagen, Denmark. Presumably, 
the woman was mentally ill and wanted to take her life. The cabin crew managed 
to overpower her, and she was handed over to the Copenhagen police force upon 
landing ( Dagbladet, November 23, 2006). Another episode reported in the media 
involved a Norwegian male on a flight from Bangkok to Copenhagen who started 
to act in a rowdy manner ( Dagbladet , November 13, 2003). The man was under 
the influence of alcohol and wanted to leave the aircraft. He was particularly loud, 
aggressive, and uncooperative. Several passengers got involved in the struggle to 
calm the man down, and a doctor on board gave him a sedative injection, which 
seemed to have little effect. The unruly passenger was finally brought under con¬ 
trol and was strapped to his chair for the remainder of the flight, after which the 
Copenhagen police dealt with the ordeal. 

These are but a few examples of situations in which aircraft passengers display 
behavior that violates the rules and regulations of air travel. Presumably, the media 
only cover more serious events, leaving minor incidents unreported. Examples of the 
latter typically include verbal abuse such as uttering insulting, harassing, or sexually 
charged statements to cabin crew members or fellow passengers. 

Why did the woman want to leave the aircraft? Was she trying to take her life, or 
was she simply unaware of her surroundings? Similarly, one may wonder what pro¬ 
voked the episode involving the heavily intoxicated male on the flight from Bangkok 
and how alcohol affected the situation. In these types of scenarios, the consequences 
for the cabin crew and other passengers getting involved in or observing the incident 
must be considered. We will return to these questions after taking a closer look at 
the term “air rage.” 
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6.12.1 What Is Air Rage? 

Aggressive passengers or customers are not unique to aviation; they may be regarded 
generally as work-related violence, which occurs in many professions, including 
health care services, education, and service industries. This chapter will only investi¬ 
gate work-related violence initiated by passengers (not by co-workers). The motivat¬ 
ing factors to such incidents are often real or imaginary inadequacies in the service 
provided or failure to meet demands, such as issues with seat reservations or the 
provision of additional alcohol. In some cases, the passenger may object to given 
instructions or refuse to accept current regulations pertaining to smoking or seat 
belts. The unique attribute of aviation is that a disorderly passenger cannot simply be 
“set off at the next stop” and that it is rather difficult to obtain additional assistance: 
As soon as the passengers have boarded and the aircraft has taken off, the crew is left 
to handle problems on its own. 

Generally, a distinction is made between the terms air rage and unruly behav¬ 
ior or misconduct. An unruly passenger normally refuses to accept regulations 
that apply aboard the aircraft. He or she may engage in threats, abusive language, 
and/or noisy and inappropriate behavior, and refuse to follow instructions given 
by crew members. However, a disorderly passenger remains nonviolent. The 
notion of air rage applies to cases that involve physical violence. The problem 
of disorderly passengers and air rage occurs worldwide and, in the worst-case 
scenario, may be considered a safety threat. NASA investigations revealed that 
errors occurred in 15 of 152 cases in which one of the pilots had to get involved 
to subdue a disorderly passenger or was sought by cabin crew to help. Such errors 
included flying at the wrong altitude or choosing the wrong runway—potentially 
dangerous situations. 

in addition to being a safety threat, air rage and unruly passengers often cause dis¬ 
comfort for the other passengers, cabin crew, and, potentially, the pilots. Additional 
landings and delays may be necessary. In some cases, the situation may escalate from 
dealing with a disorderly passenger to dealing with air rage (i.e., the person ends up 
resorting to violence). In other cases, there can be few indications that something is 
wrong, and a seemingly unprovoked attack on the crew may occur. An example of 
the latter was the case of a Norwegian Kato Air flight in 2004 from Narvik to Bodp 
(two cities in northern Norway) in which a passenger attacked the pilots with an 
ax, and a full-blown disaster was only marginally avoided. The passenger was later 
sentenced to 15 years’ imprisonment. Such events may appear similar to terrorism; 
however, disorderly passengers and air rage are not considered to be politically or 
ideologically motivated. 

6.12.2 How Frequently Does Air Rage Occur? 

It is difficult to judge the extent of the problem of air rage and whether or not it is on 
the rise, mainly because its registration has been insufficient in many countries. The 
phenomenon first gained widespread attention in the mid-1990s. Whether people 
started taking notice of the problem because occurrence rates were rising or because 
changes in the aviation industry were causing dissatisfaction in customers, leading 
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to a greater number of disorderly passengers, is a complex determination. Currently, 
little empirical evidence is available in this particular field. A possible explanation 
for this vacuum may be that it is difficult to define what constitutes unruly behavior 
(or air rage) and that the registration methods used by various airlines have not gone 
through the necessary standardization processes. 

An analysis of the available statistics revealed disparities between different air¬ 
lines, although several reported an increase in air rage and disorderly behavior (Bor 
1999). British Airways reported 266 occurrences during a year (1997-1998) in which 
a total of 41 million passengers were transported. The probability of witnessing or 
experiencing air rage thus seems quite small. However, the reporting of these events 
may contain inadequacies; for example, it is fair to assume that only more serious 
events were reported. 

6.12.3 What Causes Air Rage and Unruly Behavior? 

A number of factors are said to contribute to, or be associated with, disorderliness 
in passengers; however, data to support these claims are lacking, and conclusions 
are often based on information from secondary sources (such as the media). A study 
based on several hundred media reports on the issue found that, essentially, three 
factors were associated with disorderly or violent passengers: alcohol, nicotine 
deprivation, and mental illness (Anglin et al. 2003). These are all factors that can 
be amplified by environmental stressors, such as confined spaces, poor air quality, 
delays, and, perhaps, poor service. A typical example of air rage involves a male 
passenger with a history of violent behavior who chain smokes and consumes a large 
amount of alcohol daily. He continues drinking on board and is infuriated when 
refused the opportunity to smoke. The study estimated that alcohol was a contribut¬ 
ing factor in 40% of cases (Anglin et al. 2003). 

Other studies based on reports from companies in the United States and Britain 
have found that alcohol was a contributing factor in 43% and 50% of the reported 
cases, respectively (Connell, Mellone, and Morrison 1999). In nearly every other inci¬ 
dent, the violent passenger had started drinking before boarding the flight. Although 
we have limited knowledge about how alcohol interacts with other factors, we can 
assume that it is unlikely to be the only factor. It is more likely that alcohol reduces 
the person’s inhibitions and sense of judgment; in a high-stress situation, this leads 
him or her to use aggressive behavior as a solution to problems. 

Generally, few data are available to describe typical characteristics of disorderly 
passengers. They may be traveling alone, a couple, or a group traveling together. 
Sometimes socioeconomically well-placed individuals, such as artists or lawyers, have 
been involved in incidents using verbal and physical abuse. Perhaps such personality 
types are unaccustomed to taking orders or guidance from other people; they may not 
accept instructions from someone they perceive as of a lower rank than themselves. 
The cabin crew, on the other hand, must alternate between servicing passengers and 
being responsible for on-board safety. To some passengers, this duality may be dif¬ 
ficult to accept because they expect service rather than instructions or commands. 

The phenomenon of air rage and disorderly passengers may be better understood 
in light of general stress theory, with the addition of a number of contributing factors. 
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The cause of disorderliness, then, lies in the exposure to a sequence of events, each of 
which acts as a source of frustration (delays, queuing, and security checks). In addi¬ 
tion, this is combined with alcohol and a triggering event such as not being allowed 
to take carry-on luggage on board. Moreover, many people are anxious about flying 
to some extent, which may be a factor as well. In some cases, the cabin crew’s behav¬ 
ior and communication with the disorderly individual can contribute to the incident 
developing in a negative direction. For example, raising one’s voice to the individual 
may be necessary to make oneself heard in the cabin because there is usually con¬ 
siderable background noise; however, the upset individual may perceive the behavior 
as reproving and impolite, 

6.12.4 What Can Be Done to Prevent Air Rage? 

Airlines provide cabin crew with training in dealing with disorderly passengers, 
including worst-case scenarios that involve physically strapping passengers to their 
seats. Some companies practice zero-tolerance policies in the event of air rage, result¬ 
ing in police prosecution and denial of future flights with the company. Another 
important factor is to train cabin crew in how to manage frustrated and upset pas¬ 
sengers in general. Naturally, crew members are required to respond to complaints 
and expressions of inconvenience in a polite and respectful manner. Providing infor¬ 
mation about delays and other irregularities is also important. Part of the training 
involves recognizing and preempting problems caused by passengers suffering from 
mental illness or under the influence of alcohol. Crucial preventive measures include 
refusing intoxicated persons permission to board and limiting access to alcohol in 
airports and, not least, on board. Training may also consist of learning how to calm 
a person down in a polite manner without causing the situation to escalate. 

6.13 SUMMARY 

This chapter has discussed individual differences and common ways in how people 
react to different stressors in life. We have established that when the experienced 
level of stress exceeds the amount with which a person is able to cope, various emo¬ 
tional, cognitive, and physiological reactions emerge. These reactions are of signifi¬ 
cance to one’s general health condition, work achievements, performance, and job 
satisfaction. Stress has both short-term and long-term effects on the individual, and 
it is important to be familiar with these effects for one’s own sake and because most 
aviation professions demand significant cooperation with colleagues and others. The 
chapter has mostly related to persons working in aviation, although passenger issues 
have been described to a certain extent. 
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~7 Culture, Organizations, 
and Leadership 


7.1 INTRODUCTION 

This chapter discusses organizational and cultural factors and how these factors 
influence people working in aviation. The aviation industry is an international busi¬ 
ness in which individuals with different cultural backgrounds must work together 
to make sure that aircraft arrive at their destination in a safe and timely manner. 
Communication problems can lead to irritation and disagreement and may even have 
serious safety repercussions. Communication and coordination are always demand¬ 
ing, especially when people have different cultural backgrounds, genders, and 
languages. Toward the end of this chapter, we discuss organizational changes and 
leadership and how these influence employees and the jobs that need to be done. 

7.2 DO ORGANIZATIONAL ISSUES PLAY A ROLE IN ACCIDENTS? 

In the past decades, a number of significant accidents have occurred, such as the 
1986 meltdown at the nuclear power station in Chernobyl, the explosion at the North 
Sea Piper Alpha oil rig in 1988, and, more recently, the space shuttle Columbia 
disaster (2003). These accidents have in common that organizational factors were 
mentioned as contributing causes (Pidgeon and O’Leary 2000). The constructs of 
culture or safety culture are often mentioned as part of an explanation of what causes 
accidents or problems within the organization. 

In several preceding chapters, we have focused on individuals and individual dif¬ 
ferences. This has included selection and training as well as their importance to 
a person’s performance. In this chapter, we will look at organizational issues and 
aspects of the system that may be significant for how a person acts and, not least, the 
consequences of these actions for safety. 

Major accidents, whether they are plane crashes or nuclear disasters, have tre¬ 
mendous human, economic, and environmental consequences. Thus, attempting 
to unveil the causes of accidents is not something new. Wiegmann and colleagues 
(2004) describe various historical stages in how attempts have been made to explain 
such accidents. The first stage (technical period) was distinguished by rapid tech¬ 
nological development, during which investigators looked for inadequacies and 
outright flaws in technical systems. In the next phase, focus shifted toward human 
error (period of human error), and investigators sought to find errors in the human 
operator. This was followed by the sociotechnical stage, where the interface between 
human operators and technology was examined. In the final phase, which the authors 
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labeled the organizational culture period, individuals are no longer regarded as oper¬ 
ators of machinery isolated from the world at large, but rather as workers in a team 
within a given cultural context. 

7.3 WHAT IS CULTURE? 

The term “culture’" has been associated with organizations since the early 1980s. 
There are about as many definitions of culture as there are publications about it. 
By “organizational culture,” one normally means, somewhat inaccurately, “the way 
things are done around here.” A more formal definition can be found in the book 
on organizational culture by Henning Bang (1995, p. 23): "Organizational culture 
is the set of commonly shared norms, values, and perceived realities which evolve 
in an organization as its members interact with each other and their surroundings.” 
Schein (1996, p. 236) defines the term as the “set of shared, taken for granted implicit 
assumptions that a group holds, and that determines how it perceives, thinks about, 
and reacts to various environments.” These definitions differ in the sense that the 
former describes how the culture takes shape (i.e., through interaction), whereas the 
latter describes its effect on the members of the group. 

There are also a number of other, related terms, such as climate. Many articles 
and empirical studies use these terms interchangeably—that is, without clarifying 
their differences. Thus, scientists studying these phenomena are mapping approxi¬ 
mately the same thing, although they appear to use different constructs for what 
these things are (Mearns and Flin, 1999, provide a summary). 

Schein (1990) considers climate to be a manifestation and measurable aspect of 
culture. Thus, culture is a deeper phenomenon that is not easily charted or catego¬ 
rized. Others say that although culture is what is shared, or common, for members, 
climate is a kind of “average” of the group members’ experience—preferably the 
interpersonal relationships within the organization. The last word has hardly been 
spoken on this matter. In part, it is likely that the constructs have different histories 
and associated measurement techniques; however, the subjects of these studies are 
presumably overlapping phenomena. Climate is usually measured using standard¬ 
ized scales—an approach critics label as insufficient to get hold of the culture (see, 
for example, Schein 1990). Methodologies, including interviews and observation, 
are, then, the preferred alternatives. Most empirical studies of culture, however, have 
used questionnaires to measure the construct. 

As it relates to definitions of culture, the term “organization” applies to both busi¬ 
nesses (e.g., airlines) and groups composed in different ways, such as pilots (a pro¬ 
fession) or subgroups within a business (e.g., women or technicians). It is common 
to study national cultures—that is, the extent to which differences exist between 
nations. In aviation, therefore, we may assume that individuals are affected by sev¬ 
eral cultures: national, professional, and company (the airline for which the person 
works) cultures. Cultures may develop in many different social systems as people 
interact over time. According to Schein (1990), the conditions necessary for cul¬ 
tural development include that the individuals must have worked together for a long 
enough time to have experienced and shared important problems. They must have 
had the opportunity to solve these problems and observe the effect of implemented 
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solutions. Last, but not least, the group or organization must have taken in new mem¬ 
bers who have been socialized into the way the group thinks, feels, and solves prob¬ 
lems. The advantage of having a culture is that it makes events more predictable and 
gives things meaning, which may reduce anxiety in group members (Schein 1990). 

Subcultures, which may be in conflict or support each other, can also form within 
an organization. They may be based on profession, workplace (sea versus land), gen¬ 
der, or age. In the wake of corporate mergers, subcultures can form based on the 
formerly separate companies. In conflicts between subgroups, each side will typi¬ 
cally view the other from a polarized, black-and-white perspective: “They are bad. 
We are good—we have the correct values.” One explanation for the rise of such 
conflicts may be that groups have a need to preserve their social identity and will 
defend themselves against those who want to destroy or threaten their culture (Bang 
1995). However, when such conflicts are allowed to thrive, they can sometimes be 
devastating for an organization, with harmful consequences for the well-being and 
health of individuals in the worst-case scenario. Subcultures arise in most organiza¬ 
tions, and it is probably naive to think that conflicts will never arise between them. 
Presumably, how the organization and its leadership manage such conflicts would be 
more important than preventing their rise. 

7.4 NATIONAL CULTURE 

Aviation is, in almost every respect, an international industry. This forces companies 
and individuals to interact with people from other cultures, who often have a language 
other than English as their first language. National culture affects the way people com¬ 
municate and act. The most popular model and method for studying these national dif¬ 
ferences are based on Geert Hofstede’s questionnaire for work-related values (Hofstede 
1980, 2001). Hofstede developed the measurement instrument in connection with an 
extensive study of IBM employees in 66 countries conducted from 1967 to 1973. 
Participants noted the importance or significance of given values—that is, the extent 
to which they agreed or disagreed with the statements. The questions were placed 
in four scales (power-distance—PD; uncertainty-avoidance—UAV; individualism- 
collectivism—IND; and masculinity-femininity—MAS). Descriptions of these with 
corresponding sample statements are given in Table 7.1. 

Other scientists have used this measurement instrument in corresponding cross- 
cultural studies, and a reasonable amount of support for these four dimensions has 
been established. However, the instrument has been criticized, notably for its low 
internal consistency as to the different scales (Spector and Cooper 2002; Hofstede 
2002). Hofstede’s study is nonetheless both impressive and important because of the 
great number of countries and individuals that participated and because it facilitates 
comparisons between his data and other findings. 

A study by Merrit (2000) that included almost 10,000 pilots from 19 countries 
revealed that two of the four dimensions (IND and PD) were replicable, whereas 
there were issues with some of the original questions for masculinity and UAV. 
However, a clear correspondence was present between Hofstede’s ranking of coun¬ 
tries according to cultural dimensions and the scores of pilots from the various 
countries. In addition, there were some differences between pilots as a group and 
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TABLE 7.1 

Hofstede's Scales for Measuring National Culture 

What Does It Measure? 


Power-distance (PD) 


Uncertainty- 
avoidance (UAV) 


Individualism- 
collectivism (IND) 


Masculinity- 
femininity (MAS) 


Denotes the degree to which power is 
unequally distributed between managers 
and subordinates and the extent to which 
this is accepted. Low PD values have 
been recorded in Austria, Israel, and the 
Scandinavian countries; high PD levels 
have been found in the Philippines and 
Mexico. 

Denotes the extent to which members of a 
culture feel threatened or anxious due to 
uncertainty and unpredictable situations. 
Countries including Greece, Portugal, 
and several Latin American countries 
report high levels of UAV; the United 
States, Singapore, Sweden, and 
Denmark report low UAVs. 

The extent to which focus is on the 
individual (i.e., the individual’s rights 
and responsibilities versus the group’s). 
High levels of individualism are found 
in Western nations such as the United 
States; several countries in Asia have 
low scores. 

Measures the extent to which the culture 
emphasizes efficiency and competition 
versus more social (feminine) values. 
Countries with high femininity scores 
include the Scandinavian countries; 
Japan and some nations in Southern 
Europe and Latin America have low 
scores. 


Sample Question 

How often are employees afraid 
to express disagreement with 
their managers? 


How often do you feel nervous 
and tense at work? 


How important is it that you 
fully use your skills and 
abilities on the job? 


How important is it to have 
security of employment? 


Source: Based on Hofstede, G. 1980. Culture’s consequences: International differences in work-related 
values. Beverly Hills, CA: Sage. 


the IBM employees used as a reference by Hofstede. For example, pilots had higher 
PD than Hofstede’s group, underlining aspects of the piloting profession in which 
a clear-cut hierarchy (i.e., between captains and co-pilots) exists and is generally 
accepted. 

A study by Sherman, Helmreich, and Merrit (1997) found differences between 
countries in terms of how pilots regard rules and procedures, the usefulness of auto¬ 
mation, and the extent to which they accepted a definitive hierarchy (chain of com¬ 
mand) between captains and co-pilots. 
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In another study of military pilots from 14 NATO countries, the values on 
Hofstede’s scales were compared to accident statistics for those countries (Soeters 
and Boer 2000). Data were collected over a 5-year period (1988-1992), and the num¬ 
ber of lost planes per 10,000 flight hours was used to describe the accident ratio. This 
number was then correlated with the national values for the four cultural dimensions. 
Three out of the four dimensions were significantly correlated with results for IND 
(r = -.55), PD (r = .48), and UAV (r = .54) (Soeters and Boer 2000). There was no sig¬ 
nificant correlation between the masculinity index and accident ratio. Correlations 
increased when accidents due to mechanical failure were removed. The numbers 
thus indicate greater accident rates in countries that have low individualism scores 
and high power-distance and uncertainty-avoidance scores. 

The results are interesting; however, it is important to keep in mind that the num¬ 
ber of countries involved was only N = 14 because the nations (not the pilots) were 
the subjects of the study. In other words, the correlations are based on a fairly small 
sample. In addition, a correlation is not the same as causality. Many other factors 
vary among countries, and these factors may cause the observed variations in the 
number of accidents. In addition, culture was not measured in the military pilots, 
but the results from Hofstede’s study were used. Therefore, it is possible that figures 
are slightly different from what they would have been if culture had been measured 
in military pilots for the same time frame in which the accident statistics were col¬ 
lected. However, the study indicates that cultural factors and how they relate to acci¬ 
dents may be worth further investigation. 

A third study (Li, Harris, and Chen 2007) compared accident statistics and acci¬ 
dent causes from India, Taiwan, and the United States. A total of 523 accidents, 
including a combined 1,762 cases of human error, were investigated. The study 
summarized results from former accident surveys that used the HFACS system 
(Wiegmann and Shappell 2003) to classify errors. This system is based on Reason’s 
model, which is discussed in greater detail in Chapter 8. The study found significant 
differences among countries in what were reported as causes. Organizational causes 
were reported more frequently in Taiwan and India (countries with high power- 
distance and low individualism scores) than in the United States. This suggests a 
hierarchy in which employees expect to be told what to do (to a greater extent than 
in Western nations) and collective decisions are preferred to individual decisions. 
The authors see this aspect as a possible explanation for more frequent reporting of 
organizational errors. There is less spontaneous feedback in the system by the means 
of open discussion, and subordinates have less authority and autonomy in decision 
making and, perhaps, in correcting errors and flaws. 

7.4.1 Problems Relating to Study of Cultural Differences 

Studying national differences in culture or, for that matter, other aspects is a compli¬ 
cated job. Completing the study or survey in approximately the same way in vastly 
different countries represents one of the challenges. For example, is it possible to 
sample participants in a similar way, and is the same procedure used in all the coun¬ 
tries included in the study? The nature of the matter is that the more different the 
countries are, the more difficult it becomes to complete such a task. There may be 
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different regulations as to the available registries of the target group and whether 
permission will be granted to extract information from these groups. For example, 
comparing nurses from The Netherlands to Malaysian pilots would hardly make a 
good basis for attributing potential findings to cultural differences alone. Ideally, the 
groups in question should be as similar as possible, even though researchers typi¬ 
cally have to admit that, in practice, it is impossible to complete a survey in exactly 
the same way in all countries. Greater similarity between groups (in terms of other 
variables) generates a greater degree of certainty to conclusions that differences are 
due to cultural factors. 

Another issue is represented by the challenge of translating questionnaires 
between different languages. Even though a substantial amount of effort is put into 
translations, a statement may convey a different meaning in another language, or 
certain words and expressions may not exist in the target language or correspond to 
the original one. Often, a translation is made from the original language to the target 
language (e.g., English to Norwegian). Then, someone else who is also proficient 
in both languages translates the text back into the original language (in this case, 
English). Finally, the two English versions are compared, and at this point ambigui¬ 
ties in the translators’ efforts and what needs to be adjusted become clear. The most 
important point here is that a perfect, word-for-word translation is not necessarily 
desirable; the important thing is to preserve meaning. Finally, issues may arise in 
cross-cultural studies because people from different cultures have different response 
styles. This means that some cultures may have a greater tendency to agree to a spe¬ 
cific statement; in other cultures, opinion is expressed more freely and the extreme 
ends of the scale are used to a greater extent. 

7.5 PROFESSIONAL CULTURE 

Many occupations or professions have strong cultural identities. This applies to psy¬ 
chologists, air traffic controllers, and pilots, to mention but a few examples. Often, 
there is fierce competition to be selected and successfully complete the required 
(and often extensive) education. Upon completion of training, many people within 
the profession join powerful unions, which act to preserve the rights and interests 
of members. Some unions offer members sponsored education to maintain or build 
additional skills; they also typically provide guidelines for ethical behavior within 
the profession. Thus, unions help to socialize new members into the group (profes¬ 
sion) by exercising control over the members to some extent. Professionals, including 
pilots, psychologists, and physicians, are often highly enthusiastic and proud work¬ 
ers. They will make every effort to be successful, and few people quit the profession 
after having entered the work force. On the other hand, a strong professional culture 
can give individuals a false sense of invulnerability and disregard for their own limi¬ 
tations, according to Helmreich and Merrit (1998). Some people may be aware of 
various human limitations in general, but unaware of this applying to them (a form 
of unrealistic optimism). 

A study by Helmreich and Merrit (1998, p. 35) revealed that a large proportion of 
pilots and physicians strongly believed that they were able to do their job just as well 
in below-average conditions—that is, that equally sound judgments were made in an 
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emergency compared to under normal conditions or that personal issues were put on 
hold while working. There were also some differences between the groups; 60% of 
doctors declared they still worked efficiently when tired, compared to only 30% of 
pilots. Together, the doctors’ perceptions were somewhat less realistic than the pilots’. 
What caused this distinction is not yet clear. However, the long-term focus on human 
factors in aviation may have made its impact. For example, pilots must complete 
mandatory CRM courses, whereas this is relatively new to medical professions. 

Within a profession, there may be subgroups with which an individual feels more 
or less associated. Subgroups may be based on specialization, workplace, and/or 
gender. Examples include military versus civilian pilots or clinical psychologists 
versus psychology professors. 

One study examined cultural changes in a major Norwegian airline, with a sample 
of 190 pilots (Mjps 2002). Significant differences were found between the scores for 
this group and national norms based on Hofstede's figures. The greatest difference was 
found for the masculinity index, where pilots scored much higher than expected. 

An international study of flight controllers from Singapore, New Zealand, and 
Canada investigated how they perceived their work environment (Shouksmith and 
Taylor 1997). The idea was that there would be greater differences between flight 
controllers from an Eastern culture compared to the two Western countries. The 
flight controllers were asked about what they considered stressful in their work envi¬ 
ronment, and many similarities were found. For example, flight controllers from all 
three countries mentioned technical limitations, periods of high traffic, and fear of 
causing accidents among the top five most important sources of stress. 

On the other hand, Singaporean flight controllers also mentioned problems with 
local management in the top five, whereas the two remaining groups mentioned the 
general working environment as a top five stressor. The authors attribute this dif¬ 
ference partly to cultural factors such as higher power-distance in Asia compared 
to Western nations, causing more severe implications when disagreements arise 
between subordinates and management. Even external environmental factors may 
explain some of the differences; for example, more frequently occurring bad weather 
in Canada could have led flight controllers to mention this factor as one of the most 
significant work-related stressors. 

7.6 ORGANIZATIONAL CULTURE 

As in national and professional cultures, there may be cultural variations between 
airlines operating in a country. In a Norwegian study including three airlines (Mj0s 
2004), significant differences were found in three of the four Hofstede dimensions 
(PD, MAS, and UAV). The participating pilots (N = 440) were also asked to report 
any errors made during the past year as part of the survey. A total of 10 indicators, 
such as forgetting important checklist points or choosing the wrong taxi runway, were 
included. For each pilot, a total error score was calculated and correlated with the 
stated cultural dimensions (PD, UAV, 1ND, and MAS). A strong association (r = .54) 
was found between PD and the number of operational errors (i.e., the occurrence of 
errors increased with perceived PD). As mentioned earlier, correlations are not nec¬ 
essarily evidence of cause and effect; however, the result is interesting. We may only 
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speculate as to mechanisms behind such correlations. Kjell Mj0s (2004) suggests a 
model where cultural variables have consequences for the social environment in a 
cockpit, which has consequences for communication between pilots that, in turn, can 
lead to operational errors. 

Ten years after the first data were collected, a follow-up survey was conducted 
with a smaller selection of pilots from the largest airline from the previous study 
(Mj0s 2002). The purpose of this study was to investigate potential changes in the 
airline’s culture over time. Mj0s found a significant change in the scores for dimen¬ 
sions PD, IND, and UAV. The social climate had also improved in this period. Some 
pilots were evaluated while performing an exercise in a simulator. A lower number 
of operational errors were recorded in the follow-up study compared to 10 years 
earlier. One of the study’s weaknesses was that it did not survey exactly the same 
persons both times; however, this would obviously be difficult to accomplish due 
to the gap in time between the two surveys. Additionally, because the surveys were 
anonymous, it would have been impossible to record individual scores over time. It 
is also important to be aware that the study explains neither the cause of the cultural 
changes nor whether the recorded performance improvements can be attributed to 
these changes. 

7.7 SAFETY CULTURE 

Safety culture is a term that has been widely used in aviation and also, to some 
extent, in other industries where consequences of error are significant—for example, 
in high-tech factories and nuclear power plants, in surgery rooms, and in various 
modes of transport. Such systems often involve close interaction between technol¬ 
ogy and human operators, and errors may have disastrous consequences. There are a 
number of definitions of safety culture (for a summary, see Wiegmann et al. 2004). 
A relatively simple definition is that safety culture refers to the fundamental val¬ 
ues, norms, presumptions, and expectations that a group shares concerning risk and 
safety (based on Mearns and Flin 1999). 

Like organizational culture and organizational climate, boundaries between what 
is meant by safety culture and safety climate are unclear. Some people are of the opin¬ 
ion that they are different but related terms. That is, safety climate is measured using 
a questionnaire and provides a snapshot of how the employees perceive safety (often 
in relation to a specific issue), whereas safety culture refers to more lasting and funda¬ 
mental values and norms that partially overlap the national culture of which the orga¬ 
nization is a part (Mearns and Flin 1999). In practice, however, the notions are used 
interchangeably, and quantitative questionnaires often overlap in terms of content. 

Wiegmann and colleagues (2004) have described a number of traits or presump¬ 
tions that are shared in the different definitions of safety culture. These common¬ 
alities are that safety culture is something a group of people have in common, it is 
stable over time, and it is reflected in the organization’s will to learn from mistakes, 
events, and accidents. Safety culture influences group members’ behavior, either 
directly or through affecting the employees’ attitudes and motivation to behave in a 
way that enhances safety. 
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7.7.1 What Characterizes a Sound Safety Culture? 

An important aspect in a sound safety culture is management commitment and involve¬ 
ment in the promotion of safety. To achieve this, it is crucial that the highest levels of 
management make the necessary resources available and support the work involved. It 
must be reflected in all aspects of the organization, and routine evaluation and system 
improvements must take place. However, not only the higher levels of management but 
also lower level administrators, who should participate in activities related to improv¬ 
ing safety, are important. Little is gained by sending employees to safety classes if 
those who are monitoring the implementation of routines do not participate. 

Another indicator of safety culture is that those who are performing the specific 
jobs are given the responsibility and authority to be the last resort in case of errors. 
In other words, they feel enabled, and they regard their role as an important part in 
securing safety. This involves playing an active role and being heard in the work to 
improve safety. The organization's reward system is another aspect. Are reward sys¬ 
tems in place to promote safety, or are employees punished or neglected when taking 
on an issue? A final concern is the extent to which the organization is willing to learn 
from previous mistakes and that employees are given feedback through the reporting 
system. Encouraging employees to report errors and mistakes but doing nothing to 
correct them would be very demotivating for those who do so. 

7.7.2 How Does a Safety Culture Develop? 

Safety cultures can be categorized in several ways, and they are often based on ele¬ 
ments previously mentioned in the previous section. Hudson (2003) has developed a 
model for various safety cultures—from pathological to mature, or developed, cul¬ 
tures. The different categories are presented in Table 7.2 with sample statements for 
typical ways of regarding safety. This model expands on Westrum’s model (Westrum 
and Adamski 1999), which contained three stages or organization types: pathologi¬ 
cal, bureaucratic, and generative. 


TABLE 7.2 

Development of a Safety Culture 

Type 

High-level information flow Generative 
and trust 

Proactive 

Calculative 

Reactive 


Low degree of flow of Pathological 

information and trust 


Statement Typical of Culture 

Safety is the way we do business around here. 

We work on the problems that we still find. 

We have systems in place to manage all hazards. 
Safety is important. We do a lot every time we have an 
accident. 

Who cares as long as we are not caught? 


Source: Based on Hudson, p. 2003. Quality and Safety in Health Care 12:7-12. 



162 


Aviation Psychology and Human Factors 


The five stages in Hudson’s model provide a framework for classifying safety 
cultures and describing various levels of cultural maturity. In pathological cultures, 
safety is seen as a problem caused by operators. Making money and avoiding being 
caught by authorities are dominant motivations. In reactive organizations, safety is 
beginning to be taken seriously, but only after an accident has already occurred. In 
calculating cultures, safety is maintained by various administrative systems and is 
primarily an issue imposed on employees. Some extent of labor force involvement 
in the safety work characterizes proactive cultures; in the most advanced type of 
safety culture ( generative ), safety is everyone’s responsibility. The focus on safety 
is an important part of how business is conducted, and, even though there are few 
accidents or incidents, people do not relax (and rest on their laurels), but remain alert 
to dangers. 

An advanced safety culture may also be described, according to Hudson (2003), 
by the four elements shown in Figure 7.1: 

• Information means that all employees are informed and that information 
is shared within the organization. Importantly, bearers of bad news are not 
blamed. Instead, employees are encouraged to report problems. 

• Trust is developed through treating employees in a fair way and not punish¬ 
ing those who report errors and mistakes. 


Change 

Trust 

Worry 

Information 


FIGURE 7.1 Elements that characterize a sound safety culture. 




Culture, Organizations, and Leadership 


163 


• Change describes such organizations. This means the organization is adapt¬ 
able and learns from the times when something goes wrong and the times 
when something works well. 

• Concern or worries imply a type of constant apprehensiveness and under¬ 
standing of the fact that even though all precautions are made, something 
may go wrong. 

7.7.3 Studies of Safety Culture in Aviation 

Several questionnaires have been developed to map safety culture in aviation; 
most are multidimensional and consider many of the aspects mentioned in the 
previous section. A study of ground personnel in a Swedish airline surveyed nine 
aspects, including communication, learning, reporting, and risk perception (Ek and 
Akselsson 2007). In the survey, managers were asked to reflect on what they thought 
the employees’ responses would be. Managers were more positive about safety cul¬ 
ture than operators. Meanwhile, significant consistencies were found for their evalu¬ 
ations of the various aspects of the company’s safety culture. The lowest scores were 
given for the “justness” and “flexibility” dimensions; this applied to both manage¬ 
ment and employees. The former dimension assessed the extent to which making 
occasional mistakes was accepted, while the latter measured the extent to which 
employees were encouraged to provide suggestions for improvements. The highest 
scores were given for the dimensions “communication” and “risk perception” (Ek 
and Akselsson 2007). 

7.8 WOMEN AND AVIATION 

Figures from the United States reveal that about 90% of today’s pilots are white males, 
which means that both females and other ethnic groups are underrepresented in the 
cockpit. There is reason to believe that this could change in the future job market, but 
the rate at which it would happen is uncertain. Naturally, it is difficult to predict how 
the industry will change over time and what the need for various professions will be 
in the years to come. Several Norwegian companies have expressed concern about 
a future shortage of pilots, and numbers from the United States reveal an expected 
increase of almost 27% in the demand for commercial pilots in the period to 2010 
compared to the number employed in 2000 (U.S. Department of Labor, Bureau of 
Labor Statistics, quoted in Turney and Maxant 2004). 

Thus, it is reasonable to believe that airlines will recruit pilots and personnel 
outside national borders and in other groups than those who have chosen the piloting 
profession traditionally. It is difficult to know exactly what motivates young people 
in their choice of profession. Probably, a number of factors are involved, such as 
subjective considerations of abilities and interests as well as external factors such as 
available opportunities, financial situation, and what they are familiar with through 
family and friends. Sometimes, even personal experiences may play a part in staking 
out a career path, as was the case with one of the first Norwegian female aviators, 
Gidsken Jacobsen (Gynnild 2008): 
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A day in June 1928, a big, three-engine seaplane roared trough Ofotfjorden and landed 
outside Narvik. Somewhere in the sea of spectators was Gidsken Jakobsen. The visit 
from above became a turning point in her life. “From the day I saw Nilsson’s [flying] 
machine at the docks, there was nothing I'd rather do than fly,” she said years later. 
“Imagine flying around in the air like Nilsson and his crew, from place to place, to get 
to know the country from the air and awaking the interest in thousands of people for 
what they loved more than anything else: To fly!” 

Gidsken Jakobsen was raised in Narvik in the 1920s. She was not quite like other 
girls; she learned how to drive cars and motorbikes at an early age. At the age of 21, 
she left for Stockholm to take her pilot’s license at the Aero-Materiell flight school. 
Subsequently, she learned to fly seaplanes and, with the help of her father, bought a sea¬ 
plane that was given the name Masen (The Seagull ) (Gynnild 2008). There is no doubt 
that Gidsken Jakobsen lived an untraditional and exciting life, with great firmness of 
action and lust for life. Other examples of female pioneers can be found in Norway 
and in other countries, such as Dagny Berger and Elise Deroche, who got their pilot’s 
licenses in 1927 and 1910, respectively. Harriet Quimby crossed the English Channel 
in 1912 and Amelia Earhart crossed the Atlantic in 1932 (Wilson 2004). 

Despite the fact that women were at the center of early aviation and that there 
were many female pioneers, the piloting profession of today is distinctly male domi¬ 
nated. Although it is difficult to predict with accuracy the global rate of female par¬ 
ticipation in this profession, it has been estimated at between 3 and 4% in Western 
countries (Mitchell et al. 2005). 

It is difficult to say why more women are not fascinated by the act of flying. 
Probably, there are individual reasons and reasons based on the nature of aviation. 
Perhaps the profession is seen as particularly masculine because history is full of 
heroic achievements performed by men—“the right stuff.” Thus, young females 
would rather choose other education paths and career opportunities. Perhaps the 
thought of entering a trade where one risks being isolated and perhaps lives with 
negative and sexist comments from colleagues and others is less than appealing. Has 
the aviation industry been interested in allowing women into the cockpit, and with 
which attitudes are female pilots met? How can a female applicant to this profession 
expect to be received, and how do gender issues affect interactions in the cockpit, 
which, traditionally, is occupied by two males? Research into interactions between 
pilots has long been concerned with communication and cultural differences and 
their impact on aviation safety. Still, research into gender issues in aviation, both in 
terms of attitudes toward female pilots and consequences for interactions between 
pilots, is insufficient. 

7.8.1 Attitudes toward Female Pilots 

To provide information about attitudes toward female pilots, a study was initiated 
in South Africa, the United States, Australia, and Norway; female and male pilots 
were asked about how they regarded female pilots (Kristovics et al. 2006). The sub¬ 
jects were asked to express their opinions on a number of statements about female 
pilots. In addition, the survey presented an open question to which participants could 
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TABLE 7.3 

Examples of Questions from Survey on Attitudes toward Female Pilots 


Category 

Decision/leadership 


Assertiveness 


Hazardous behavior 

Affirmative action 


Sample Statements 

Female pilots often have difficulty making decisions in urgent situations. 

Female pilots’ decision-making ability is as good in an emergency situation as it 
is in routine flights. 

Male pilots tend to “take charge” in flying situations more than female pilots do. 

Male flight students tend to be less fearful of learning stall procedures than 
female students are. 

Male pilots are more likely to run out of fuel than female pilots are. 

Male pilots tend to take greater risks than female pilots do. 

Professional female pilots are only in the positions they are in because airlines 
want to fulfill affirmative action quotas. 

Flight training standards have been relaxed so that it is easier for women to get 
their wings. 


express in their own words what they thought about the issues raised by the survey 
and the survey itself. The survey contained four sections of questions (examples are 
presented in Table 7.3). The participants responded by expressing their agreement or 
disagreement on a five-point scale. Results were then summarized for each dimen¬ 
sion so that a high score represented a positive attitude and a low score indicated a 
negative attitude. Some of the questions thus needed to be reversed to combine the 
scores into a unified index. 

A total of 2,009 pilots (312 females and 1,697 males) with an average age of 36 
participated in the survey. There were different proportions of participants from the 
four different countries, with 53% from Australia, 28% from South Africa, 9% from 
the United States, and 10% from Norway. Results revealed gender-related differ¬ 
ences for all four dimensions; that is, male pilots regarded female pilots in a more 
negative light than female pilots did. Differences were greatest for the statements in 
the categories “decision/leadership” and “affirmative action.” 

There were also differences between countries, as presented in Figure 7.2. The 
following results are based on male pilots only because the numbers of female pilots 
were very low in some participating countries. For three of the four dimensions, the 
Norwegian pilots were more positive than pilots from other countries, whereas for 
the dimension labeled “hazardous behavior” the situation was reversed. Statements 
in the latter category were of the type: “Male pilots tend to take greater risks than 
female pilots” and Norwegian pilots, to a lesser extent, agreed with these statements. 
This probably reflects the equality-mindedness of Norwegians, in which female pilots 
would not be viewed as being more careful or apprehensive than male colleagues. In 
summary, the results of the survey revealed that male Norwegian pilots were more 
positive about female pilots than their U.S., Australian, and South African counter¬ 
parts. Perhaps this expresses stronger ideals of equality in Norway compared to the 
other countries involved in the investigation, which is in line with Hofstede’s (1980) 
findings for Scandinavian countries. 
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Decision/Leadership Assertiveness Hazardous Affirmative 

Behavior Action 


+ South Africa —■—USA —▲— Australia —X—Norway 


FIGURE 7.2 Differences in attitude in four countries based on male pilots’ responses. 

The group contained 158 pilot instructors. As a subgroup, instructors were found 
to be more positive toward female pilots compared to the group at large. Additionally, 
those who had had the opportunity to fly with a female co-pilot were more positive 
toward female pilots than those who had not. 

Participants had the opportunity to provide supplemental comments to the survey, 
and there were several similarities in negative commentary from different countries 
(Mitchell et al. 2005). Some were about the aviation industry not being suited for 
women, exemplified in statements such as: “A female pilot = an empty kitchen” and 
“If women were meant to fly, the sky would be pink.” The least amount of negative 
commentary was from the Norwegian group. Positive comments were often about 
how female pilots were just as good as male pilots, or better, and that it was unrea¬ 
sonable to judge people solely based on gender. Many Norwegian respondents (most 
of whom were military pilots) commented that because both males and females went 
through the same selection process, they were equally capable of performing the 
given tasks. 

7.9 REORGANIZATION AND ADAPTING TO 
NEW WORKING CONDITIONS 

Many industries are subject to restructuring and reorganization. Such changes are 
often attributed to a need or desire to improve the company’s efficiency and profit¬ 
ability. Sometimes, economic conditions or strategic decisions lead to downsizing. 
However, one problem with downsizing is that the tasks to be performed do not 
disappear with the sacked staff; consequently, a greater workload rests on those who 
remain employed in the company. In addition, new technologies are often introduced. 
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so employees must be willing to learn new skills and perform new tasks. Industries 
that were previously regarded as firmly established and able to offer attractive work¬ 
ing conditions are no longer plentiful, leading to more shortsighted planning by 
employers and employees. 

Restructuring is regarded as a common and correct strategy that, according to 
McKinley and Scherer (2000), provides managers with a feeling of cognitive order— 
a positive emotion that increases the probability of further restructuring. Although 
organizational changes are usually carried out with the best intentions, studies point 
to negative consequences in the form of stress and reduced work satisfaction among 
employees, as well as lower than expected financial gains for the company (for a 
review, consult Mentzer 2005). One interpretation of such outcomes involves insuf¬ 
ficient and inappropriate management of human factors—in other words, underesti¬ 
mating the impact that expansive restructuring processes have on employees. 

7.9.1 Reactions to Organizational Changes 

For employees, consequences of corporate reorganization may be considerable. Work 
is a very important part of most people’s lives, and significant changes in a company 
can be perceived as a critical life event. What individuals are most concerned about 
will vary, but may include questions such as 

• How will this influence my day-to-day work routine and my career? 

• Will my skills and experience be adequate in the future? 

• Will I be in demand in the new organization? 

• Will I be treated fairly in the reorganization process? 

Restructuring may be seen in light of general stress theories—for example, 
Frankenhaeuser’s bio-psychosocial model, which was mentioned in Chapter 6. In 
such models, restructuring would be regarded as a stressor to which individuals com¬ 
pare their resources. If demands surpass resources, stress arises and, in turn, nega¬ 
tive health implications. What makes such a comparison or evaluation difficult is the 
scarcity of information about what will happen (and when). In the case of mergers 
and acquisitions, employees are usually not informed until relevant decisions have 
been made. Decision-making processes are typically secret on the grounds of pro¬ 
tecting company interests. Internal rumors about what is about to happen often cir¬ 
culate, and this only makes it more difficult for employees to determine what will be 
required of them in the future and which tasks will be faced in the new organization. 
It is common not to know whether one has a job in the “new” company. That said, not 
everyone would react in a negative way to the message of an imminent restructuring 
or reorganization. Some people may consider such changes natural and appropriate, 
with opportunities for them to “climb the ladder.” 

In addition to individual reactions, what are the consequences of restructuring and 
downsizing to the organization? Initially, a more productive and dynamic organiza¬ 
tion is desired, but this outcome cannot be taken for granted. Potential savings must 
be weighed against possible negative consequences, such as increases in absences 
due to illness, higher turnover rates, and organizational unrest. Restructuring—in 
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particular, downsizing—leads to reduced trust in management and lower work sat¬ 
isfaction among employees. How strong employee reactions will be depends on a 
number of circumstances. An important factor is the extent of the changes. The 
merger of two large organizations and the fusion of two smaller departments within 
an organization are different, for example. A decisive factor, though, is how the 
changes are communicated—notably, that the necessity of change is emphasized. 
Good communication contributes to reductions in anxiety and negative reactions, as 
well as improved work satisfaction in employees. 

A study into reactions in Canadian hospital employees (N = 321) compared reac¬ 
tions to various forms of organizational changes including structural changes, work¬ 
place changes, and introduction of new technology (Bareil, Savoie, and Meunier 
2007). Direct comparisons in how subjects reacted were possible because the sub¬ 
jects went through the same changes. A certain proportion (about a quarter) reacted 
with the same degree of discomfort to all the different types of changes, while the 
majority (about three quarters) experienced varying degrees of discomfort according 
to the type of change. Thus, it appears the type of change (situational factors), rather 
than personal characteristics (dispositional factors), is more important in terms of the 
level of discomfort associated with the changes. 

In a Norwegian study into the restructuring and downsizing of a large oil com¬ 
pany operating off the coast of Norway, employees’ (N = 467) attitudes were inves¬ 
tigated using a questionnaire (Svensen, Neset, and Eriksen 2007). The survey was 
conducted after the announcement (but before the implementation) of restructuring 
plans. About a third responded positively to the changes. The most important fac¬ 
tors predicting a positive attitude were feelings of responsibility to the organization, 
involvement, participation, team leadership, and efficiency. 

7.9.2 Downsizing 

Downsizing has a number of negative implications for both those who lose their jobs 
and those who remain with the organization. Further, several studies show business 
earnings rarely improve as a consequence of downsizing (Mentzer 1996). Those who 
remain in the organization may suffer from so-called “survivor sickness”—a condi¬ 
tion characterized by job security issues and negative or cynical attitudes toward 
the company. Even managers charged with executing reorganization and downsiz¬ 
ing processes may experience stress, feelings of guilt, and, occasionally, aggression 
toward employees. 

A study of employees in different sectors (including banking, insurance, and tech¬ 
nology) investigated reactions in the wake of downsizing (Kets de Vries and Balazs 
1997). The study was based on qualitative interviews of employees who lost their 
jobs (N = 60) and employees who remained with the company, so-called “survivors” 
(.N = 60). Subjects in the former group were categorized according to their reactions: 
(a) the adapting (43%), (b) the depressed (30%), (c) those who see new opportunities 
(17%), and (d) the hostile (10%). In addition, combinations of reaction patterns were 
observed. Managers who had had to fire people were also subject to the study, and 
most of them said it was a difficult process. Great variations in the way managers 
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reacted, ranging from distancing themselves from the problem to depression and 
feelings of guilt, were reported. 

A survey conducted by Mari Rege at the University of Stavanger (in collabora¬ 
tion with Statistics Norway) tracked a large group of employees over 5 years (Rege, 
Telle, and Votruba in press). Employees in stable and secure companies were ana¬ 
lyzed and compared to employees in companies exposed to downsizing. Downsizing 
was found to have a number of health-related consequences for the individual, par¬ 
ticularly males. In males, results revealed higher mortality rates (14%), inability 
to work (24%), and risk of divorce (11%). The researchers attributed the gender- 
related differences to males taking job loss more seriously because male identities 
are more closely tied to their paid work. The study also revealed that the negative 
effects on someone who had lost his or her job depended on the sector in which 
he or she worked. The most severe consequences were observed in manufacturing 
industries. 

7.9.3 Psychological Contracts 

When seen as isolated or separate incidents, the reactions that managers and employees 
display in the previously described situations may seem strange, irrational, and even 
surprising. However, reactions become more predictable when viewed as reactions 
to intense stress. Both sides have been through a very difficult process that included 
losing one or more colleagues and, perhaps, friends. Moreover, they have been forced 
to change their perception of having worked in a financially solid company. 

Downsizing may be regarded as a serious breach of the psychological contract 
that exists between employees and employers. Psychological contracts are indi¬ 
vidual expectations shaped by the organization about how exchanges should take 
place between the employee and the organization (see, for example, Rousseau 1995). 
Contracts are expectations about the future that are based on trust, acceptance, and 
reciprocity; they make it easier for people to plan and anticipate future events. Often 
these contracts contain the contributions expected of the employee as well as the 
compensation offered by the organization in return. For example, the employee 
promises to work hard, be loyal, and contribute to fulfilling the company's mission 
statement. The employer, on the other hand, promises ongoing employment, pay¬ 
ment, and opportunities for personal development and career advancement. Such 
contracts are not necessarily synonymous with written contracts, and there may be 
certain disagreements about what the contract involves. 

Psychological contracts are formed in many ways—for instance, through ver¬ 
bal expression, written documents, observation of how others are treated within the 
organization, and company policy or culture. They may be expressed in writing or 
by stories and myths about how things have been done before. Because contracts 
are formed by the person perceiving and integrating information, there is an obvi¬ 
ous chance that misinterpretations can occur. Breach of contract arises when an 
employee feels that the organization has not fulfilled its duties. However, the organi¬ 
zation or local management may have a different view. 

Studies of people starting out in a new job show that contracts are often breached. 
One study revealed that 54% experienced a breach of contract during the first 2 years 
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(Robinson and Rousseau 1994). Thus, breaches of psychological contracts are not 
unusual in an organization. The severity of the breach will affect the severity of reac¬ 
tions and consequences. Breaches may occur knowingly and intentionally or because 
the business does not have the necessary resources to meet contractual requirements. 
Often, severe breaches of contract can be devastating and lead to negative reac¬ 
tions in employees, such as mistrust, anger, and wanting to quit the job. Generally, 
repeated offenses may degrade the relationship between the employee and the orga¬ 
nization (Robinson and Rousseau 1994). 

A number of factors influence the severity and nature of the consequences of 
breach of contract. One aspect is whether the breach occurred on purpose and whether 
similar breaches have occurred previously (a “string” of breaches). If an employer is 
unable to keep his or her promise to provide sponsored education because of a budget 
deficit, the employee may find it more acceptable than if the manager simply thinks 
such training is a bad investment. Events in the aftermath of a contractual breach may 
help to repair the relationship or, conversely, enforce its negative consequences. 

7.10 LEADERSHIP 

Sound management is important in many aspects of the working environment, par¬ 
ticularly in relation to the development of a safety culture and in terms of processes 
of organizational change. A number of theories describe various leaders or leadership 
types. Leadership, naturally, does not exist in a vacuum but in a historical and cul¬ 
tural context; what works in one situation may not automatically transfer to a differ¬ 
ent organization or to a different point in time. In a time when many organizations are 
constantly changing and reinventing themselves, enormous demands are placed on 
managers to motivate and inspire employees. Many studies have shown a correlation 
between stress (burnout) and various forms of support or, perhaps, lack of support 
from managers (Lee and Ashforth 1996). Thus, it is crucial for an organization to be 
able to select and develop good leaders who earn and maintain employees’ trust. 

7.10.1 Three Leadership Types 

Many theories and research traditions describe what makes a good leader. Some 
emphasize a leader’s personality characteristics and others focus on what he or she 
actually does. A number of studies on leadership were conducted at Ohio State 
University in the postwar era. Two dimensions were identified as characteristic of 
effective leaders: that they were considerate and that they took initiative in gener¬ 
ating structure (see, for example. Judge, Piccolo, and Ilies 2004). Other research 
groups have identified similar dimensions, but with different denotations, such as 
“relation-oriented leadership” and “task-oriented leadership.” 

Many new publications and books have been written recently on transactional 
leadership and transformational leadership (Burns 1978; Bass 2007). These leader¬ 
ship theories involve aspects concerning managers, subordinates, and the interac¬ 
tions between these two sides: 
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• In transactional leadership , the exchange of rewards for results and com¬ 
pletion of tasks is emphasized. Alternatively, the employees are allowed to 
do their jobs as long as production targets are achieved; that is, error correc¬ 
tion is considered sufficient. This may involve a passive, apprehensive style 
(the manager avoids action until something goes wrong) or an active style 
(the manager reacts to errors made by employees). 

• In transformational leadership, emphasis is not only on transactions 
between employees and management (work for payment and other bene¬ 
fits), but also on the leader’s ability to inspire, motivate, and devise original 
ideas. The person must be charismatic, set a good example, and be able to 
communicate his or her vision, thus elevating employees toward the orga¬ 
nization’s common goal. This leadership type is also described according 
to its impact on employees. Transformational leadership is associated with 
increased work satisfaction and motivates employees to perform better. 

These two leadership styles are not mutually exclusive. Rather, they complement 
each other. A manager-employee relationship often starts as a transactional rela¬ 
tion—that is, a process of clarifying the expectations that both sides have. However, 
transformational leadership is necessary if employees are to be motivated to put 
in additional effort (Bass 2007). Transformational leaders motivate employees to 
work not only for immediate rewards in self-interest, but also for the benefit of the 
group, the company, or, indeed, the country. Work becomes an activity greater than 
something done just to get paid, and this recognition contributes to an increase in 
employees’ self-esteem and devotion to their tasks. 

Evidence tells us that this categorization of leadership is not tied to a particular 
type of organization or culture, and the dimensions have been established in a num¬ 
ber of organization types, such as the armed forces and the private and public sectors 
of many countries (Bass 2007). However, the ways in which the different forms of 
leadership are revealed may vary between cultures. For example, the way a leader 
rewards or shows appreciation for employees will vary between countries such as 
Norway and Japan. 

The two forms of leadership are often contrasted to laissez-faire leadership, or 
a lack of leadership. Managers who exhibit such leadership do not recognize their 
responsibility as leaders and do not provide assistance or communicate their opin¬ 
ions on important issues regarding the organization. This form of leadership is the 
least effective and also represents the type of management under which employees 
are the least content with the status quo. 

The different forms of leadership are often quantified using the measurement 
instrument called MLQ (multifactor leadership questionnaire) (Bass 1985). The 
instrument has subsequently been revised and consists, in short, of various state¬ 
ments on leadership to which employees respond on a scale from zero (the behavior 
is never observed) to four (the behavior is often, if not always, observed). Three 
scales map transformational leadership and, accordingly, three measure transac¬ 
tional leadership. However, only one scale measures laissez-faire leadership. A 
number of studies have used this instrument or variants of it in which the different 
scales are correlated with work performance measures (subjective and objective). 
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Summaries show that positive correlations are strongest for transformational scales 
and some of the transactional scales, whereas correlations between laissez-faire 
leadership and work results are negative (see, for example, Bass 2007). These find¬ 
ings are generally based on North American studies; however, a clear connection 
between transformational leadership and variables such as work satisfaction and 
efficiency has been supported in Norwegian surveys as well (Hetland and Sandal 
2003). 

Gender issues and leadership have been subject to plenty of discussion. For exam¬ 
ple, is it true that females have different leadership styles from those of males? In 
most industries, women are underrepresented in leadership positions, particularly at 
the highest levels. Do female leadership styles constitute a barrier to clinching those 
top jobs? A meta-analysis of 45 studies that investigated gender differences found 
generally minor differences in male and female leadership (Eagly, Johannesen- 
Schmidt, and van Engen 2003). Women had somewhat higher scores for transfor¬ 
mational leadership, and men scored higher on the scales measuring laissez-faire 
leadership and the two forms of transactional leadership—that is, intervening only 
when something goes wrong. These results are positive for the case of female leader¬ 
ship. In other words, no evidence supports claims that female leaders use less effec¬ 
tive leadership styles—quite the contrary. 

With regard to the correlation between personality and leadership styles, several 
studies have been executed on the topic. A meta-analysis of over 20 studies conducted 
by Bono and Judge (2004), revealed low correlations between personality traits (the 
“big five” model) and the various scales presented in MLQ. The greatest and most 
stable correlations were found for the scales for transformational leadership and the 
personality traits extroversion (mean r = .24) and neuroticism (mean r = -.17). 

7.10.2 Leadership and Safety 

Few studies on leadership in aviation have been conducted. However, a number of stud¬ 
ies in other sectors may shed light on the relationship between leadership and safety. 
Several studies have investigated the connection of management, safety climate, and 
work-related accidents. A model for the relationship among these elements was devel¬ 
oped by Barling, Loughlin, and Kelloway (2002). In this model, transformational lead¬ 
ership affects the security climate, which, in turn, affects the occurrence of accidents. 
The model also contained the important notion of safety awareness. Enforcement of 
this notion implies that employees are aware of the dangers associated with the various 
operations and know what needs to be done when accidents and dangerous situations 
arise. The model was tested on a group of young employees, mainly in the service 
sector. Results indicate that the most important effect of transformational leadership 
was its influence on safety awareness in individual employees; in turn, this affected 
the safety climate and, ultimately, the number of accidents (Barling, Loughlin, and 
Kelloway 2002). 

Others have researched corresponding models for different sectors—for example, 
factory workers in Israel, where safety climate was found to be a mediating variable 
between leadership styles and the number of accidents (Zohar 2002). In this study, 
both transformational and transactional leadership predicted accidents; however, the 
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effects were mediated through a certain aspect of the safety climate that was labeled 
“preventive action.” This dimension contained questions that mapped the extent to 
which immediate superiors discussed safety issues with employees and whether they 
accepted safety advice from employees. The study concluded that leadership dimen¬ 
sions associated with concern for the welfare of employees and personal relations 
promoted improved supervision and thus improved the safety climate and reduced 
the number of accidents (Zohar 2002). 


7.11 SUMMARY 

For better or worse, people are influenced by the cultural context of groups or com¬ 
munities to which they belong. In this chapter, we have looked at how national, pro¬ 
fessional, and organizational cultures influence people working in aviation. Culture 
affects not only the behavior of a person, but also the way he or she communicates 
and perceives other people and the world (worldview). 

Aviation is an inherently international industry that, similarly to many other 
industries, faces challenges in the form of tough competition, market instability, and 
increased focus on security and terrorism. Restructuring and downsizing are of great 
importance to the welfare of employees and, in turn, their performance and duties in 
relation to the organization. Sound leadership is of decisive importance to develop¬ 
ment of a safety culture and to reducing unintended consequences of restructuring 
and reorganization. 
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8.1 INTRODUCTION 

An American cowboy rodeo saying states, “There’s never been a horse that can’t be 
rode; there’s never been a rider that can’t be throw’d.” That adage applies equally 
well to aviation. There has never been a pilot so skilled that he or she cannot have 
an accident. Clearly, some pilots are exceptionally skilled and cautious. However, 
given the right set of circumstances, even they can make an error of judgment or find 
that the demands of the situation exceed their capacity or the capabilities of their 
aircraft. What sets these particular pilots apart is that these combinations of events 
and circumstances occur very rarely; their attributes, including attitudes, personal¬ 
ity, psychomotor coordination, aeronautical knowledge, skills, experiences, and a 
host of other individual characteristics, make them less likely to experience hazard¬ 
ous situations and more likely to survive the situations if they occur. 

In contrast, for pilots at the low end of the skill continuum, every flight is a risky 
undertaking. In this chapter, we will explore some of the research that has attempted 
to explain, from the perspective of human psychology, how these groups of pilots 
differ, why accidents occur, and what might be done to reduce their likelihood. 

8.2 ACCIDENT INCIDENCE 

To begin, let us examine the incidence of aviation accidents so that we may under¬ 
stand the extent of the problem. Table 8.1 shows the numbers of accidents and cor¬ 
responding accident rates (number of accidents per 100,000 flight hours) for a 2-year 
period in the United States. From this table, the differences in accident rates of the 
large air carriers (very low rates), the smaller carriers, and general aviation are evi¬ 
dent. Over that span of operation, the accident rate increases about 30-fold. To put 
these statistics in a slightly different light, on a per-mile basis, flying in an air carrier 
is about 50 times safer than driving. However, flying in general aviation is about 
seven times riskier than driving. 

These accident rates are typical of the rates found in Western Europe, New 
Zealand, and Australia. For example, data from the Australian Transport Safety 
Bureau (ATSB 2007) show fixed-wing, single-engine general aviation accident rates 
(accidents/100,000 hours) of 10.26 and 7.42 for 2004 and 2005, respectively. Note 
that these rates are somewhat inflated relative to the United States because they do 
not include multiengine operations normally used in corporate aviation, which is 
traditionally one of the safest aviation settings. 

This brings up an important point that must be made regarding safety statistics. It 
is very important to note the basis on which the statistics are calculated. For example, 
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TABLE 8.1 

Incidence of Accidents in the United States 

2004 2005 



Number 

Rate 3 

Number 

Rate 3 

Large air carriers 

30 

0.16 

39 

0.20 

Commuter 

4 

1.32 

6 

2.00 

Air taxi 

66 

2.04 

66 

2.02 

General aviation 

1617 

6.49 

1669 

6.83 


Source: Federal Aviation Administration. 2007. 
a Rate is given as accidents per 100,000 flight hours. 


in Table 8.1, the rates are given in terms of numbers of accidents per 100,000 flight 
hours. This is a commonly used denominator, but by no means the only one that 
is reported. Our earlier comparison of accident risk in driving and aviation used 
accidents per mile traveled. Some statistics are in terms of numbers of departures 
(typically, accidents per one million departures). It is important for the reader to 
make note of these denominators so that comparisons are always made between 
statistics using the same denominator. In addition, as in our comparison between 
the statistics from the United States and Australia, it is important to know exactly 
what has been included in the calculations. In this case, exclusion of the very safe, 
multiengine corporate operations could lead to the conclusion that general aviation 
is safer in the United States than in Australia—a conclusion that is not warranted by 
the data provided. 

8.3 CAUSES OF ACCIDENTS 

For every complex question, there is a simple answer—and it’s wrong. 

Attributed to H. L. Mencken 

Before we begin to talk about the causes of accidents, we need to make clear what 
we mean by a “cause.” Step away from the flight line for a moment and into the 
chemistry laboratory. If we were to put a few drops of a solution containing silver 
nitrate (AgN0 3 ) into another solution that contains sodium chloride (NaCl, common 
table salt), we would observe the formation of some white particles (silver chloride, 
AgCl) that would sink to the bottom of our test tube. This simple test for the pres¬ 
ence of chlorine in water by the addition of aqueous silver nitrate is, in fact, one of 
the most famous reactions in chemistry, and it is among the first learned by all bud¬ 
ding chemists. The point to be made here is that this reaction and the formation of 
the precipitate will happen every single time that we mix solutions of silver nitrate 
and sodium chloride. The precipitate will not form unless we add the silver nitrate. 
The addition of the silver nitrate to the sodium chloride solution is a necessary and 
sufficient condition for the formation of the precipitate. We may truly say that one 
causes the other. 
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Now step back outside the laboratory and consider what happens in the real world. 
For example, let us imagine an individual driving to work one morning when traffic 
is very heavy; he is following closely behind the vehicle ahead. Occasionally, that 
vehicle will brake sharply, so he needs to react quickly and apply the brakes to keep 
from hitting it. This happens dozens, perhaps hundreds, of times during the trip and 
he is always successful in avoiding an accident. During the same trip, he listens to 
music on the radio and occasionally changes the station by glancing at the radio and 
pressing the buttons to make a selection. He may do this several times during the 
course of the trip, also without incident. There may even be occasions when, as he 
is changing stations on the radio, the vehicle ahead brakes, and he glances up just 
in time to notice the brake lights and slow down. Fortunately, he is a careful driver 
and usually maintains an adequate spacing between his vehicle and the vehicle he is 
following, so he is always able to react in time, even if he is temporarily distracted 
by the radio. He may do this every day for years, without incident. 

However, on one particular morning he is delayed leaving the house, so he does 
not get his usual cup of coffee and is feeling a little sleepy. He is also feeling a bit 
rushed because he needs to be at the office at the usual time, but he has gotten a late 
start. Perhaps this has led him to follow the vehicle ahead a little more closely than 
usual and, now, as he reaches over to change the radio, the driver ahead brakes more 
sharply than usual; he does not notice the vehicle’s brake lights quite soon enough 
or react quickly enough to slow his vehicle. An accident occurs. But what was the 
cause of the accident? 

From the official standpoint (the one that will go on the police report), the indi¬ 
vidual in the vehicle behind the braking driver was the cause, and this is yet another 
example of human error. However, that is not a very satisfying explanation. It is not 
satisfying because it describes actions taken on almost every trip for many years as 
an error. Surely, there have been many days on which the individual left the house 
late and hurried to make up time. Surely there have been days when he felt a little 
sleepy when driving to work. Likewise, he has handled heavy traffic and changing 
radio stations innumerable times previously. All of these actions and conditions have 
existed previously and we have not called them errors and the causes of an accident 
because, until this particular day, no accident has occurred. None of these conditions 
and events is necessary and sufficient for an accident to occur. However, each of 
them, in its own small way, increased the likelihood of an accident. 

Therefore, we suggest that the best way to understand the causes of accidents is to 
view them as events and conditions that increase the likelihood of an adverse event 
(an accident) occurring. None of the usual list of causes—following too closely, inat¬ 
tention, sleepy driver, distraction—will cause an accident to occur each and every 
time it is present. However each will independently increase the likelihood of an 
accident. Moreover, their joint presence may increase the likelihood far more than 
the simple sum of their independent effects. For example, following too closely in 
traffic and driving while drowsy both increase the risk of an accident—let us say by 
10% each. However, following too closely in traffic while drowsy might increase the 
risk of an accident by 40%, not the 20% obtained by simply summing their inde¬ 
pendent contributions. Thus, the combination of these two conditions is far more 
dangerous than either is by itself. 
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Causes are best understood as facilitators of accidents rather than as determinants 
of accidents. They increase the probability that an accident will occur, but they do 
not demand that it occur. This argument implies that accidents generally have mul¬ 
tiple facilitating components (causes). 

Most authors, at least in recent years, acknowledge in the introduction to their 
research that there is no single cause for accidents and then proceed to ignore that state¬ 
ment in the conduct and interpretation of their research. Arguably, the present authors 
could be included in that indictment. However, to atone for those past literary indiscre¬ 
tions, let us now reiterate that point: There are no single causes for accidents. 

Usually the “cause” is simply the last thing that happened before the crash. As this was 
being written, an Airbus was being extracted from the Hudson River after both engines 
failed at 3,200 feet during takeoff from LaGuardia Airport. The newspapers reported that 
the cause of the crash was the engines’ ingestion of a flock of geese. However, they also 
reported that the captain of the flight was an experienced glider pilot, with an exceptional 
interest in safety. Clearly, multiple causes were at work here—the flock of geese may have 
caused the engines to quit, but the experience and skill of the captain may have been the 
cause of the relatively benign water landing that resulted in no fatalities. 

In exploring cause-and-effect relationships, we may move away from the final 
cause to whatever extent results in a comprehensive understanding of the event. For 
example, we might ask what caused the geese to be in the flight path of the aircraft. 
Did placing a major airport along a river in the fly way for migratory waterfowl play 
some part? We might also ask what part the pilot’s gliding experiences played in the 
outcome. Did they “cause” a catastrophic event to become an exciting, but injury- 
free event? When we take a more situated view, we recognize that there are no “iso¬ 
lated” events. Everything happens in a context. 

Each accident occurs because of a complex web of interacting circumstances, 
including environmental conditions, pilot attributes, aircraft capabilities, and support 
system (e.g., air traffic control, weather briefer) weaknesses. A complete explanation 
of how those elements interact to produce an accident is far beyond our current sci¬ 
ence. Science does not, at this time, allow us to predict with anything approaching 
certainty that, under a well-specified set of circumstances, an accident will occur; 
this is definitely not the chemistry laboratory. 

To begin, we do not know the set of circumstances that should be specified or the 
values to assign to the various elements so that they combine properly. Despite this 
abundant ignorance, we are able to make some statements regarding probabilities. 
That is, we are able to say with some confidence that accidents are more likely to 
occur under some circumstances than under other circumstances. The identification 
of these circumstances and the establishment of the degree of confidence with which 
we may assert our beliefs make up the topic to be considered next. 

Many efforts have been made to identify the causes for aircraft accidents over the 
years. Although they suffer from the implicit assumption of single causes, which we 
have dismissed as naive, these efforts nevertheless can make a contribution to our 
understanding of accident causality by identifying some of the circumstances and 
attributes associated with accidents. 

To look at accident causality in a slightly different way, consider the following 
parable. 
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Imagine you are standing in a calm pool of water. You reach into your 
pocket and grasp a handful of pebbles and toss them into the pool. Each of the 
pebbles disturbs the surface of the pool, creating an exceptionally complex 
set of interacting waves. Because some of the pebbles are larger than others, 
their waves are higher than those of the smaller pebbles. Where the waves 
intersect, they combine algebraically, depending on the phase and magnitude 
of each individual wave. At some points, they combine to produce a wave that 
reaches high above the surface. At other points, they cancel each other out. 
Occasionally, several of the individual waves will intersect at just the right 
moment to produce a freak wave of exceptional height that will cause the water 
to lap over the tops of your boots—an adverse event. For all intents and pur¬ 
poses, the occurrence of this freak wave is random and largely unpredictable. 
Its production depends on the number of pebbles plucked from your pocket, the 
size of the pebbles, the height of their toss into the pool, and the amount of dis¬ 
persion of the pebbles as they fall down into the water. It will also depend upon 
who else is standing in or around the pool tossing pebbles into the water. 

Observation of many tosses of pebbles into the pool and the use of statistics 
will allow us to predict that a freak wave will occur every so often—let us 
say once in every 100 tosses. However, we cannot reliably say whether any 
particular toss will produce the freak wave or where on the surface of the pool 
this wave will occur. Even so, we are not powerless to prevent its occurrence. 
We might, for instance, reduce the number of pebbles we throw. We might 
reduce the size of the pebbles. We might change the vigor with which we throw 
them into the air. We might throw them less vertically and more laterally so 
as to increase their dispersion. We might also make rules about how often the 
people on the shore can throw pebbles into the pond. 

Freak waves might still occur, but now we might only see them once in every 
1,000 tosses. Further reductions might be achieved by, for example, coating the 
pebbles with some substance that reduces their friction as they pass through 
the surface of the water, hence producing individual waves of still lower mag¬ 
nitude. However, even with all these procedural and technological advances, 
as long as we toss pebbles into the pool, there is some nonzero chance that a 
freak wave will occur. Completely eliminating freak waves (and the attendant 
adverse event—wet feet) requires that we and our companions abandon our 
practice of tossing pebbles into the pool or that we take a radically different 
approach. For example, we might wait for winter, when the pool will freeze. 


8.4 CLASSIFICATION OF AIRCRAFT ACCIDENTS 

According to the Aircraft Owners and Pilots Association (AOPA 2006), causes of 
accidents may be broken down into three categories: 


Pilot-related accidents arise from the improper action or inaction of the 
pilot. 



182 


Aviation Psychology and Human Factors 


TABLE 8.2 

Causes of General Aviation Accidents in 2005 


Major Cause 

All Accidents 

Fatal Accidents 

Pilot 

1076 

74.9% 

242 

82.9% 

Mechanical/maintenance 

232 

16.2% 

22 

7.5% 

Other/unknown 

128 

8.9% 

28 

9.6% 

Total 

1436 


292 



• Mechanical/maintenance accidents arise from failure of a mechanical 
component or errors in maintenance. 

• Other/unknown accidents include causes such as pilot incapacitation, as 
well as accidents for which a cause could not be determined. 

Table 8.2 shows the distribution of accidents among those three categories of cause 
for general aviation accidents in 2005. Clearly, the predominant major cause for both 
levels of severity was the pilot. Considering only those accidents in which the pilot 
was the major cause, the AOPA further divided the accidents among the categories 
shown in Table 8.3. Interpretation of the data presented in Table 8.3 is made difficult 
by the mixture of stage-of-flight categories (i.e., preflight/taxi, takeoff/climb, and 
landing) with two categories (fuel management, weather) that are conceptually unre¬ 
lated to the other stage-of-flight categories. This admixture of taxonomic elements 
muddies the interpretation of an analysis of only dubious initial value. At most, one 


TABLE 8.3 

Accident Categories for Pilot-Related 
Accidents 


Category 


Total 


Fatal 

Preflight/taxi 

38 

3.5% 

1 

0.4% 

Takeoff/climb 

165 

10.5% 

33 

13.6% 

Fuel management 

113 

10.5% 

20 

8.3% 

Weather 

49 

4.6% 

33 

13.6% 

Other cruise 

21 

2% 

14 

5.8% 

Descent/approach 

49 

4.6% 

25 

10.3% 

Go-around 

43 

4.0% 

15 

6.2% 

Maneuvering 

122 

11.3% 

80 

33.1% 

Landing 

446 

41.4% 

8 

3.3% 

Other 

30 

2.8% 

13 

5.4% 


Source: Aircraft Owners and Pilots Association. 

2006. The Nall report, p. 8. Frederick, MD: 
Author. 



Aviation Safety 


183 


might inspect these data and conclude that maneuvering flight is a dangerous phase. 
However, these data say nothing about why maneuvering flight is dangerous or dem¬ 
onstrate that it is relatively more dangerous than other stages of flight because there 
is no control for exposure—the amount of time spent in that flight stage. 

We belabor this point to reinforce the notion that categories are not causes. 
Categories do not explain why accidents occur. They simply point to times, con¬ 
ditions, and circumstances under which accidents are more likely. To illustrate 
this point one last time, an accident does not occur simply because the pilot was 
in maneuvering flight. It occurred while the pilot was in maneuvering flight and 
decided to buzz his friend's house and was distracted and flew too slowly and stalled 
the aircraft and encountered a downdraft and was flying an underpowered aircraft 
and had too much load on board and ...—the list could go on for a very long time. 
The reader should recall our earlier discussion about how each of these “causes” 
increases the likelihood of an accident while not guaranteeing its occurrence. 

In a seminal and frequently cited study, Jensen and Benel (1977) noted that all air¬ 
crew errors could be classified into one of three major categories based on behavioral 
activities: procedural, perceptual-motor, and decisional tasks. This conclusion was 
based on an extensive review of all U.S. general aviation accidents occurring from 
1970 to 1974 using data from the National Transportation Safety Board (NTSB). Of 
the fatal accidents involving pilot error during that period, Jensen and Benel found 
that 264 were attributable to procedural errors, 2,496 had perceptual-motor errors, 
and 2,940 were characterized as having decisional errors. Examples of procedural 
tasks include management of vehicle subsystems and configuration; related errors 
would include retracting the landing gear instead of flaps or overlooking checklist 
items. Perceptual motor tasks include manipulating flight controls and throttles, and 
errors would include overshooting a glide-slope indication or stalling the aircraft. 
Decisional tasks include flight planning and in-flight hazard evaluation; errors would 
include failing to delegate tasks in an emergency situation or continuing flight into 
adverse weather. 

Diehl (1991) analyzed U.S. Air Force and U.S. civil air carrier accident data for 
accidents that occurred during 1987, 1988, and 1989. His analysis of the air carrier 
data indicated that 24 of the 28 major accidents (those resulting in destroyed air¬ 
craft and/or fatalities) involved aircrew error. Of these accidents, 16 procedural, 21 
perceptual-motor, and 48 decisional errors were cited (the errors sum to more than 
24 because some accidents involved multiple errors). During the same time period, 
169 major mishaps were reported for the U.S. Air Force; these involved destruction 
of the aircraft, over one million dollars in damages, or fatalities. Of the 169 major 
mishaps, 113 involved some type of aircrew error. These included 32 procedural, 
110 perceptual-motor, and 157 decisional errors. These types of errors were labeled 
“slips,” “bungles,” and “mistakes,” respectively (Diehl 1989). 

The comparison of the incidence of these three types of errors among the three 
aviation sectors is depicted in Table 8.4. It is interesting to note that even though 
these three sectors differ vastly with regard to many aspects (e.g., training, composi¬ 
tion of aircrew, type of aircraft, and type of mission), the relative incidence of the 
errors is remarkably similar for all three groups. 
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TABLE 8.4 

Types of Aircrew Errors in Major Accidents 

Category of Error 
Perceptual-Motor 

Kind of Operation Procedural "Slips" "Bungles" Decisional "Mistakes" 

General aviation 5% 44% 52% 

Airlines 19% 25% 56% 

Military 11% 37% 53% 


Source: Adapted from Diehl. A. E. 1991. Paper presented at the 22nd International Seminar of the Inter¬ 
national Society of Air Safety Investigators. Canberra: November 1991. 


The analyses discussed so far were largely conducted on an ad hoc basis. That is, 
they were not conducted on the basis of a specific, well-defined theory of why acci¬ 
dents occur. However, the work of Perrow (1984) on the nature of accidents in closely 
coupled systems and the work by Reason (1990, 1997) have led to one theoretical 
conceptualization of why accidents occur. This theory, most widely articulated by 
Reason, is commonly referred to as the “swiss-cheese model” and suggests that gov¬ 
ernments, organizations, and people create barriers to the occurrence of accidents. 
Examples of barriers include regulations that prescribe certain rest periods between 
flights, checklists that must be followed during the planning and conduct of a flight, 
procedures for the execution of an approach to landing, and standard operating pro¬ 
cedures for resolving normal and abnormal situations. 

Each of these barriers is created to prevent or require behavior that results in safe 
operations. As long as the barrier is in place and intact, no errors associated with that 
barrier can occur. The barrier acts as a shield to prevent accidents from occurring or 
to prevent the effects of failures of other barriers from resulting in an accident. One 
might envision these barriers as stacked one against the other. However, no barrier 
is perfect. Each might be described as having “holes”—areas in which the defenses 
associated with a particular barrier are weak or missing—hence, the swiss-cheese 
description, as depicted in Figure 8.1. 

The swiss-cheese model of accident causation proposed by Reason was operational¬ 
ized by Wiegmann and Shappell (1997, 2003), who developed a system for categorizing 
accidents according to the sequential theory proposed by Reason. Wiegman and Shappell 
termed this approach, which is a taxonomy that describes the human factors that contrib¬ 
ute to an accident, the human factors analysis and classification system (HFACS). The 
system has four levels arranged hierarchically. At the highest level are the organizational 
influences. Next are aspects of unsafe supervision, followed by the preconditions for 
unsafe acts. Finally, at the lowest level, are the unsafe acts of operators. These levels cor¬ 
respond with the barriers to accidents shown in Figure 8.1. These major components may 
be further broken down into the following elements (Shappell and Wiegmann 2000): 

• organizational influences 
• resource management 
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FIGURE 8.1 The swiss-cheese model of accident causation. 


• organizational climate 

• organizational process 

• unsafe supervision 

• inadequate supervision 

• planned inappropriate operations 

• failure to correct a problem 

• supervisory violations 

• preconditions for unsafe acts 

• substandard conditions of operators 

- adverse mental states 

- adverse physiological states 

- physical and mental limitations 

• substandard practices of operators 

- crew resource management 

- personal readiness 
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• unsafe acts of operators 

• errors 

- skill based 

- perceptual 

- decisional 

• violations 

- routine—habitual departures from rules sanctioned by management 

- exceptional—departures from rules not sanctioned by management 

HFACS has been used in a variety of settings, including analyses of accidents in 
the U.S. military services and analyses of accidents among air carriers and general 
aviation in the United States (e.g., Shappell and Wiegmann 2002, 2003; Shappell et 
al. 2006). It has also been used in settings outside the United States (e.g., Gaur 2005; 
Li and Harris 2005; Markou et al. 2006). 

In a study conducted by the Australian Transport Safety Bureau, Inglis, Sutton, 
and McRandle (2007) conducted a comparison of U.S. and Australian accident 
causes using HFACS. The results of their analyses are shown in Table 8.5. The 
authors concluded: 

The proportion of accidents that involved an unsafe act was similar: of the 2,025 acci¬ 
dents in Australia, 1,404 (69%) were identified as involving an unsafe act while 13,700 
accidents (72%) in the US involved an unsafe act. Moreover, the pattern of results 
between United States and Australian accidents was remarkably similar. The rank 
order of unsafe act categories was the same in the accident sets for both countries. 
Skill-based errors were by far the most common type of aircrew error followed by 
decision errors, violations and perceptual errors, in that order, (p. 39) 

Although HFACS has achieved widespread use, it is not without its limitations, 
critics, and alternatives. For example, it is important to note that HFACS is a second¬ 
ary analysis process. That is, HFACS does not deal with primary data. Rather, it 


TABLE 8.5 

Accidents Associated with Each HFACS Unsafe Act 

Australia United States 


Unsafe Act 

Frequency 

% 

Frequency 

% 

Skill-based error 

1,180 

84 

10,589 

77.3 

Decision error 

464 

33 

3,996 

29.2 

Perceptual error 

85 

6.1 

899 

6.6 

Violation 

108 

7.7 

1,767 

12.9 

Sample size 

1,404 


13,700 



Source: Inglis, M., Sutton, J., and McRandle, B. 2007. Human factors analysis of Australian aviation 
accidents and comparison with the United States (aviation research and analysis report— 
B2004/0321). Canberra: Australia Transport Safety Bureau. 
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deals with data generated by accident investigators and is in that sense removed from 
the reality of the accident. In contrast, consider a botanist classifying a new leaf. The 
botanist examines the leaf itself—the shape, arrangement of veins, coloration, and 
perhaps even the chemical composition of the leaf. The botanist does not examine 
a narrative description of someone else’s impressions of a leaf. Yet, this is precisely 
the case with HFACS. The analysts using HFACS use the narrative report of an acci¬ 
dent prepared by an accident investigator as their data. Although studies (Shappell 
et al. 2006; Wiegmann and Shappell 2001) have shown that these analysts exhibit 
good reliability in their judgments, the results are still subject to the validity, or lack 
thereof, of the original accident investigator. Further, the analysts are largely con¬ 
fined to examining only those data that the original investigator considered relevant 
and/or those data required by the regulatory body responsible for the investigation. 

At a more basic level, the use of “disembodied data,” to use Dekker’s (2001) 
phrase, to explain a complex event after the fact is arguably a futile endeavor. It is 
only with hindsight that we are able to label decisions as poor judgment or actions 
as mistakes. Taken within the context of the situation, these decisions and actions 
may well have seemed entirely appropriate and correct to the individuals at the time, 
given the knowledge they possessed. Further, the utility of assigning accidents to 
categories is not entirely clear. Knowing that an accident belongs to a specified cat¬ 
egory does not imply that we know why the accident occurred. To know that an 
accident occurred because of a “skill-based pilot error” does not tell us why what we 
have labeled as an error after the fact occurred. Labeling is not the same as under¬ 
standing. Labeling will never lead to prevention, but understanding may. 

Notwithstanding these criticisms, however, if accidents need to be assigned to catego¬ 
ries (perhaps for some actuarial or political reason), then HFACS is probably the method 
of choice. However, alternatives exist, and the interested reader is directed to Beaubien 
and Baker’s (2002) excellent review of current taxonomic methods applied to aviation 
accidents for a comprehensive overview and comparative analyses of this topic. 

8.5 SPECIAL PROBLEMS IN DOING RESEARCH ON ACCIDENTS 

Several problems arise when one is conducting research on the causes of accidents. 
Arguably, the dominant problem is the scarcity of accidents. Before the reader tears 
this book to shreds and writes an angry letter to the editor of his or her local newspa¬ 
per condemning bloodthirsty aviation researchers, let us state that we are not wish¬ 
ing for more accidents. Rather, we are pointing out that the prediction of rare events 
presents some special difficulties from a statistical standpoint, the details of which 
are beyond the scope of this book. Some of the difficulties can be illustrated with a 
simple example, however. 

Let us assume that we are interested in predicting the occurrence of fatal accidents 
among general aviation pilots in the United States. There are roughly 300,000 general 
aviation pilots in the United States and every year about 300 fatal accidents occur. 
(For the sake of convenience, let us use rounded numbers that are approximately 
accurate.) Thus, if accidents are purely random, the chance of any particular pilot 
being in a fatal accident is 0.001 (300 divided by 300,000) or one out of a thousand. 
Imagine, then, if we are trying to evaluate whether pilots with beards have more fatal 
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accidents than pilots without beards. We might begin our study in January by identi¬ 
fying a group of 100 pilots, half with beards and half without. Then in December we 
would see how many are still alive. Fortunately for the pilots, but unfortunately for 
our research project, all the pilots will almost certainly still be alive at the end of the 
year—thus telling us nothing about the accident-causing properties of beards. The 
difficulty is that if only one pilot out of a thousand has an accident during a year, then 
we would expect something less than one pilot out of the hundred in our study group 
to have an accident. Unless the effect upon safety of having (or not having) a beard is 
tremendous, we are unlikely to obtain any results of interest. 

Technically, what we are lacking here is variance (or variability) in the criterion 
(accident involvement). If there is no variance, then the observation can contain no 
information. This is equivalent to trying to find the smartest person in the class by 
giving all the students a test. If the test is too easy, then all the students get per¬ 
fect scores, and we cannot tell from the scores who is the smartest student. (For 
additional information, consult Nunnally [1978] for his description of the effects of 
extreme p/q splits on the point biserial correlation.) 

This limitation leads to the use of nonparametric statistics (e.g., chi-square), which 
do not require normal distributions (e.g., Poisson and negative-binomial regression), 
and, on occasion, very large samples. It also leads to the use of measures other than 
accident involvement as the criteria for studies. 

8.5.1 Is a Close Call Almost the Same as an Accident? 

It can be argued that accidents can be thought of as the tip of an iceberg. There are 
relatively few accidents, but there are many more incidents and hazardous events 
that did not result in an accident. Sometimes the difference between an incident and 
an accident is very slim—perhaps only a matter of a few feet in clearing a tree on 
takeoff. Thus, these incidents may represent instances in which, had the circum¬ 
stances been only slightly different (perhaps a slightly hotter day, just a little more 
fuel on board, only a few less pounds of air in the tires), the incident would have 
become an accident. 

It has been suggested (Hunter 1995) that incidents and hazardous events could 
be considered surrogates for the actual measure of interest—accident involvement. 
Finding a significant relationship between some measure of interest (e.g., some per¬ 
sonality trait) and the surrogate measure of number of hazardous events experienced 
during the previous year would therefore be an indication that the measure may be 
related to accident involvement. This approach has been taken in several of the stud¬ 
ies described later in this chapter. 

The prevalence of incidents and hazardous events has been examined in the United 
States (Hunter 1995) and in New Zealand (O’Hare and Chalmers 1999). Hunter con¬ 
ducted a nationwide survey of U.S. pilots using a set of questions he termed the 
“hazardous events scale” (HES). The HES consisted of questions that asked respon¬ 
dents to indicate the number of times they were involved in potentially hazardous 
events. The questions and the responses for private and commercial pilots are given 
in Table 8.6. For this same group of respondents, approximately 9% of the private 
pilots and 17% of the commercial pilots reported having been in an aircraft accident 
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TABLE 8.6 

Reports of Hazardous Events among U.S. Pilots 



Private 


Commercial 

Low fuel incidents 

No. of Events' 1 

0 

1 

2 

3 

>4 

Reporting 

80% 

16% 

3% 

1% 

0% 

No. of Events 

0 

1 

2 

3 

>4 

Reporting 

66% 

24% 

7% 

2% 

2% 

On-airport precautionary or forced 

landing 

0 

54% 

0 

41% 


1 

23% 

1 

21% 


2 

11% 

2 

15% 


3 

4% 

3 

7% 


>4 

8% 

>4 

17% 

Off-airport precautionary or forced 

0 

93% 

0 

82% 

landing 


1 

5% 

1 

10% 


2 

1% 

2 

3% 


3 

0% 

3 

2% 


>4 

0% 

>4 

3% 

Inadvertent stalls 

0 

94% 

0 

90% 


1 

5% 

1 

6% 


2 

1% 

2 

2% 


3 

0% 

3 

0% 


>4 

0% 

>4 

1% 

Become disoriented (lost) 

0 

83% 

0 

83% 


1 

14% 

1 

13% 


2 

2% 

2 

3% 


3 

0% 

3 

1% 


>4 

0% 

>4 

0% 

Mechanical failures 

0 

55% 

0 

33% 


1 

27% 

1 

26% 


2 

10% 

2 

17% 


3 

4% 

3 

9% 


>4 

4% 

>4 

16% 

Engine quit due to fuel starvation 

0 

93% 

0 

84% 


1 

6% 

1 

12% 


2 

1% 

2 

3% 


3 

0% 

3 

1% 


>4 

0% 

>4 

1% 
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TABLE 8.6 (CONTINUED) 





Reports of Hazardous Events among U.S. Pilots 




Private 


Commercial 


No. of Events' 1 

Reporting 

No. of Events 

Reporting 

Flow VFR into IMC 

0 

77% 

0 

78% 


1 

15% 

1 

14% 


2 

6% 

2 

5% 


3 

1% 

3 

2% 


>4 

2% 

>4 

2% 

Become disoriented (vertigo) while in 0 

95% 

0 

91% 

IMC 

1 

4% 

1 

7% 


2 

1% 

2 

2% 


3 

0% 

3 

0% 


>4 

0% 

>4 

0% 

Turn back due to weather 

0 

29% 

0 

23% 


1 

21% 

1 

16% 


2 

19% 

2 

18% 


3 

13% 

3 

11% 


>4 

22% 

>4 

32% 

Source: From Hunter, D. R. 1995. 

Airman research questionnaire: Methodology and overall results 

(technical report no. DOT/FAA/AM-95/27). Table 21. Washington. D.C.: Federal Aviation 

Administration. 





a Number of times this event has been experienced during a 

flying career. 




at some point in their careers. Clearly, these were nonfatal accidents (because the 
pilots were alive to respond to the survey), which are about five to six times more 
prevalent than fatal accidents (i.e., there were about 300 fatal accidents in 2005 out 
of about 2,000 total accidents). 

In a survey of New Zealand pilots, O’Hare and Chalmers (1999) found that 
encounters with potentially hazardous events were fairly common. Furthermore, the 
experiences reported by the New Zealand pilots were remarkably similar to those of 
the U.S. pilots. For example, the proportions of pilots who had entered IMC (slightly 
under 25%) or inadvertently stalled an airplane (about 10%) were almost identical in 
both samples. It is interesting that, even given the geographic disparity between the 
United States and New Zealand, the experiences of pilots in the two countries were 
so similar. This is a finding to which we will return later in this chapter. 

Findings such as these have encouraged researchers to consider the utility of such 
measures in accident research. Because many more incidents than fatal accidents 
occur (perhaps as many as two or three orders of magnitude), then using them in 
research, in lieu of accidents, alleviates to some degree the problems associated with 
the prediction of rare events. However, these are only surrogates for the true criterion 
of interest, so these results are only suggestive of possible relationships. 
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8.5.2 Out of the Air and into the Laboratory 

Because of the difficulties associated with conducting naturalistic research—that 
is, research involving pilots engaged in flights—some researchers have chosen to 
conduct laboratory-based research. Typically, this research utilizes flight simulators 
of varying degrees of fidelity and flight profiles designed to expose the pilots to the 
conditions of interest. Laboratory research has the advantage of tight control over the 
stimuli (e.g., the aircraft capabilities, the weather experienced). However, because 
the risk of physical injury and death is never present in these situations, it is always a 
challenge to defend the generalizability of results from the laboratory to actual flight. 
Both naturalistic and laboratory research have their advantages and disadvantages 
and it is incumbent upon the researchers to utilize their capabilities appropriately 
and to note the limitations of their research in their reports. 

8.6 WHY ARE SOME PILOTS SAFER THAN OTHERS? 

Having touched upon some of the problems of doing research on aviation accidents, 
let us now turn to some of the work that has been done to help explain why some 
pilots are more likely to have accidents than others. For this discussion, we will 
put aside considerations of whether some aircraft are safer than others (they are), 
whether some training makes pilots safer (it does), or whether other environmental 
factors such as geographic variables, air traffic control, and maintenance impact 
safety (they all do). Instead, we will look solely at the psychological characteristics of 
the pilots—their attitudes, personality, decision-making skills—and how they influ¬ 
ence the likelihood of being in an accident. 

8.7 THE DECISION-MAKING COMPONENT OF ACCIDENTS 

Earlier in the discussion of various attempts to catalog the causes of accidents, the 
study by Jensen and Benel (1977) was described as a seminal work in that it specifi¬ 
cally identified decision making as contributing to a large proportion of fatal general 
aviation accidents. That study sparked a great deal of interest in how pilots make 
decisions that put them at risk of being in an accident, and how decision making 
might be improved. Indeed, one of the first outcomes from this focus on decision 
making was a report by Berlin et al. (1982a) in which they described a training pro¬ 
gram aimed specifically at addressing the decision-making shortcomings identified 
in the Jensen and Benel study. 

Working at Embry-Riddle Aeronautical University under the sponsorship of the 
U.S. Federal Aviation Administration (FAA), Berlin et al. (1982a) developed a train¬ 
ing program and student manual that included the following: 

• three subject areas 

• pilot—the pilot’s state of health, competency in a given situation, level 
of fatigue, and other factors affecting performance 

• aircraft—considerations of airworthiness, power plant, and perfor¬ 
mance criteria such as weight and balance 
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• environment—weather, airfield altitude and temperature, and outside 
inputs such as weather briefings or ATC instructions 

• six action ways 

• do—no do 

- Do—the pilot did something he or she should not have done. 

- No do—the pilot did not do something he or she should have done. 

• under do—over do 

- Under do—the pilot did not do enough when he or she should have 
done more. 

- Over do—the pilot did too much when he or she should have 
done less. 

• early do—late do 

- Early do—the pilot acted too early when he or she should have 
delayed acting. 

- Late do—the pilot acted too late when he or she should have 
acted earlier. 

• poor judgment behavior chain 

• One poor judgment increases the probability that another poor judg¬ 
ment will follow. 

• The more poor judgments made in sequence, the more probable that 
others will continue to follow. 

• As the poor judgment chain grows, the alternatives for safe flight decrease. 

• The longer the poor judgment chain becomes, the more probable it is 
that an accident will occur. 

• three mental processes of safe flight 

• automatic reaction 

• problem resolving 

• repeated reviewing 

• five hazardous thought patterns (having one or more of these hazardous 
thought patterns predisposed pilots to acting in ways that placed them at 
greater risk for accident involvement) 

• antiauthority 

• impulsivity 

• invulnerability 

• macho 

• external control (resignation) 

The second volume of this report (Berlin et al. 1982b) contained detailed descrip¬ 
tions and exercises corresponding to the preceding elements. This included an instru¬ 
ment that pilots could use to do a self-assessment of their hazardous thought patterns. 
An initial, small-scale evaluation of the training manual was conducted using three 
groups of students at Embry-Riddle Aeronautical University. One of the three groups 
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received the new training program, and the other two groups served as controls. In 
this evaluation, the experimental group had significantly better performance on writ¬ 
ten tests and on an observation flight than the two control groups. On the basis of 
these findings, the authors concluded that the training manual had a positive effect 
on the decision making of the test subjects. 

8.8 AERONAUTICAL DECISION MAKING 

Based upon the positive initial results, a series of publications using material taken 
from the Berlin et al. training manual were produced by the FAA. Each publication 
was tailored to fit the needs and experiences of a particular segment of the pilot 
population. The publications included: 

• “Aeronautical Decision Making for Helicopter Pilots” (Adams and 
Thompson 1987); 

• “Aeronautical Decision Making for Instructor Pilots” (Buch, Lawton, and 
Livack 1987); 

• “Aeronautical Decision Making for Student and Private Pilots” (Diehl et 
al. 1987); 

• “Aeronautical Decision Making for Instrument Pilots” (Jensen, Adrion, and 
Lawton 1987); 

• “Aeronautical Decision Making for Commercial Pilots” (Jensen and Adrion 
1988); and 

• “Risk Management for Air Ambulance Helicopter Operators” (Adams 1989). 

In addition to these publications, which are rather narrowly aimed at training spe¬ 
cific skills in defined groups of pilots, Jensen (1995) has produced a text that covers the 
common elements across all these publications in greater depth. This book also incor¬ 
porates considerations of crew resource management (CRM), a concept that developed 
more or less in parallel with the work on pilot decision making. However, although the 
decision-making work was typically oriented toward general aviation pilots, the work 
on CRM grew out of reviews of accidents in the air carrier community. 

Following the release of the aeronautical decision-making (ADM) training man¬ 
ual by Berlin et al. (1982a, 1982b) and the initial evaluation of the effectiveness 
of this training in the United States, similar studies were conducted elsewhere. In 
Canada, a study that evaluated air cadets was conducted in which the judgment of 
the cadets was tested during a flight by asking them to perform an unsafe maneuver. 
Pilots who received the ADM training made correct decisions in 83% of the test 
situations, compared to 43% of the pilots who did not receive such training (Buch 
and Diehl 1983, 1984). Another study of pilots attending flight schools in Canada 
also showed a significant impact for the ADM training (Lester et al. 1986). In that 
study, 70% of the group that received ADM training chose the correct responses on 
an observation flight, compared to 60% of the pilots in the control group. 

Similar results were noted in a study conducted in Australia (Telfer and Ashman 
1986; Telfer 1987, 1989) using students from five flying schools in New South Wales. 
Even though the samples used were quite small (only 20 total subjects divided among 
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three groups), significant differences in favor of the ADM training were found among 
the groups. 

The results from these evaluations of ADM training and others (Diehl and Lester 
1987; Connolly and Blackwell 1987) were reviewed by Diehl (1990), who provided a 
summary of the results in terms of reductions in pilot error. These results are repro¬ 
duced in Table 8.7. 

Given the consistently significant results demonstrated by ADM training in reduc¬ 
ing error among pilots in these studies, it seems clear that the training does have an 
impact on pilot behavior. However, significant issues have not been addressed by 
these studies. For example, how long does the effect last following training? Because 
all the evaluations were conducted immediately following the completion of pilot 
training or shortly thereafter, the rate at which the training effect decays cannot be 
determined. This is important because, if the effect only lasts a short time, then the 
training must be repeated frequently to maintain the effect. 

Second, what parts of the training are having an impact? Recall from the list of 
contents presented earlier that ADM training covers a fairly broad spectrum of top¬ 
ics, ranging from decision heuristics (DECIDE model) to personality traits (“five 
hazardous attitudes”). Because these were all covered at the same time during the 
ADM training, it is not possible from the existing data to determine whether they are 
all needed or whether only one or two of the individual components are responsible 
for the improvements in behavior. 

Finally, an inspection of the data in Table 8.7 suggests that the venue in which the 
training is administered may influence the magnitude of the effect. Specifically, it is 
interesting to note that the smallest effects were noted in the least rigorous training 
environments (the aero clubs and fixed-base operators); much larger effects were 
found in the aeronautical university and flight school environments. 

All of these would be important issues to consider and investigate more fully. 
Indeed, this approach to developing an intervention without an underlying theoreti¬ 
cal rationale and without a firm empirical basis has exposed ADM training to some 


TABLE 8.7 

Results of ADM Training Evaluation Studies 


Researchers 

Environment 

Experiment 

n 

Control 

n 

Error 

Reduction 

Berlin et al. (1982) 

Aeronautical university 

26 

24 

17% 

Buch and Diehl (1983) 

Flying schools 

25 

25 

40% 

Buch and Diehl (1983) 

College 

17 

62 

9% 

Telfer and Ashman (1986) 

Aero clubs 

8 

6 

8% 

Diehl and Lester (1987) 

Fixed-base operators 

20 

25 

10% 

Connolly and Blackwell (1987) 

Aeronautical university 

16 

16 

46% 


Source: Adapted from Diehl, A. E. 1990. In Proceedings of the 34th Meeting of the Human Factors 
Society, 1367-1371, Table 1. Santa Monica, CA: Human Factors Society. 

Note: All results significant, p < .05. 
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criticism (cf. O’Hare and Roscoe 1990; Wiggins and O’Hare 1993). These criticisms 
are reflected in studies of one major component of ADM training: the five hazardous 
thoughts or, as they are sometimes called, “hazardous attitudes.” 

8.9 HAZARDOUS ATTITUDES 

All of the ADM training manuals, including the FAA publications listed earlier, have 
included an instrument for the self-assessment of hazardous attitudes. Although the 
content varies with the specific group of pilots for whom the training was designed, all 
the instruments consist of some number (typically 10) of scenarios in which an aviation 
situation is described. Five alternative explanations for the course of action taken by the 
pilot in the scenario are then provided and the pilot completing the instrument is asked 
to choose the one that he or she thinks best applies. The following example is taken 
from the FAA publication aimed at student and private pilots (Diehl et al. 1987): 

• You have just completed your base leg for a landing on runway 14 at an uncon¬ 
trolled airport. As you turn to final, you see that the wind has changed, blowing 
from about 90 degrees. You make two sharp turns and land on runway 8. What 
was your reasoning? 

a. You believe you are a really good pilot who can safely make sudden 
maneuvers. 

b. You believe your flight instructor was overly cautious when insisting that a 
pilot must go around rather than make sudden course changes while on final 
approach. 

c. You know there would be no danger in making the sudden turns because 
you do things like this all the time. 

d. You know landing into the wind is best, so you act as soon as you can to 
avoid a crosswind landing. 

e. The unexpected wind change is a bad break, but you figure if the wind can 
change, so can you. 

Each of the five alternatives is keyed to one of the five hazardous attitudes. In this 
example, the keyed attitudes are (a) macho, (b) antiauthority, (c) invulnerability, (d) 
impulsivity, and (e) resignation. (Thus, the person who selected alternative a as the 
best explanation for the behavior of the pilot in the scenario would be espousing a 
macho attitude.) Using the scoring key provided in the training manuals, pilots can 
compute their scores for each of the five hazardous attitudes and can create a profile 
of their hazardous attitudes. From that profile, they may identify which, if any, of 
the attitudes is dominant. The text of the training manuals then provides some guid¬ 
ance on dealing with each of the hazardous attitudes and proposes some short, easily 
remembered “antidotes” for each attitude. For example, the antidote for having a 
macho attitude is that “taking chances is foolish” (Diehl et al. 1987, p. 63). 

Three studies (Lester and Bombaci 1984; Lester and Connolly 1987; Lubner and 
Markowitz 1991) have compared the hazardous attitudes scales with other personal¬ 
ity measures. These have included the Rotter locus of control scale and several scales 
from the Cattell 16 PF. In all of these studies, the scores for the individual hazard¬ 
ous attitudes were found to be highly correlated with each other. Moderate to low 
correlations were also observed with the other personality measures as well as with 
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external criteria such as involvement in near-accidents. However, the interpretation 
of the results from these studies is problematic. 

Hunter (2004) has criticized the use of the self-assessment instrument used in the 
various FAA publications because it utilized an ipsative format. As Anastasi (1968, p. 
453) notes, an ipsative scale is one in which “the strength of each need is expressed, 
not in absolute terms, but in relation to the strength of the individual’s other needs... 
an individual responds by expressing a preference for one item against another.” 
In this type of scale, having a high score on one subscale forces the scores on the 
other subscales to be low. Because of this restriction, ipsative scales are subject to 
significant psychometric limitations. These limitations make the use of traditional 
statistical analysis methods (such as correlation) inappropriate in many instances. 
Hence, studies that have tried to correlate hazardous attitude scores taken from the 
self-assessment instrument with other criteria (such as scores on other psychological 
instruments) are seriously flawed and cannot, for the most part, provide useful and 
reliable information. (For a discussion of the difficulties associated with ipsative 
scoring methods, see Bartram 1996; Saville and Wilson 1991.) 

To address this problem, Hunter (2004) recommended that researchers use 
scales based on Likert scale items. This type of item, widely used in psychological 
research, typically consists of a statement (e.g., “I like candy”) to which the respon¬ 
dent expresses his or her degree of agreement by selecting one of several alternatives 
(e.g., “strongly agree,” “agree,” “disagree,” or “strongly disagree”). 

Hunter and other researchers (Holt et al. 1991) have developed instruments for 
the assessment of hazardous attitudes using Likert scale items. In a comparison of 
the traditional instrument contained in the FAA training materials and the Likert- 
style instruments (Hunter 2004), the superiority of the Likert instruments in terms 
of reliability and in correlations with external criteria (involvement in hazardous 
events) was clearly demonstrated. Using these measurement instruments, it is pos¬ 
sible to demonstrate empirically that pilots’ attitudes can affect the likelihood of 
their involvement in an accident. 

Hazardous attitudes are but one of several psychological constructs that have been 
considered as possible factors that impact the likelihood of accident involvement. 
These constructs include locus of control, risk perception, risk tolerance, and situ¬ 
ational awareness. 

8.10 LOCUS OF CONTROL 

Locus of control (LOC) refers to the degree to which a person believes that what 
happens to him or her is under his or her personal control (internal LOC) or that 
what happens is a result of external factors (e.g., luck, the actions of others) over 
which he or she has no control (external LOC). This construct was first proposed by 
Rotter [1966] and since that time it has been used in a variety of settings (see Stewart 
[2006] for a review). Wichman and Ball (1983) administered the LOC to a sample 
of 200 general aviation pilots and found that the pilots were significantly more inter¬ 
nal than Rotter’s original sample. They also found that the pilots who were higher 
on LOC internality were more likely to attend safety clinics, possibly indicating a 
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greater safety orientation among this group compared to the pilots with a greater 
LOC externality. 

Several variations on the Rotter LOC scale have been constructed that assess 
LOC perceptions in a particular domain. These include LOC scales that are specific 
to driving (Montag and Comrey 1987) and medical issues (Wallston et al. 1976). 
These development efforts were spurred by the belief, noted by Montag and Comrey 
(1987, p. 339), that “attempts to relate internality-externality to outside criteria have 
been more successful when the measures of this construct were tailored more spe¬ 
cifically to the target behavior (e.g., drinking, health, affiliation), rather than using 
the more general I-E scale itself.” 

Continuing in this same vein, Jones and Wuebker (1985) developed and vali¬ 
dated a safety LOC scale to predict employees’ accidents and injuries in industrial 
settings. They found that participants in the lower accident risk groups were signif¬ 
icantly more internal on the safety LOC than participants in the high-risk groups. 
In a subsequent study of safety among hospital workers, Jones and Wuebker (1993) 
found that workers who held more internal safety attitudes were significantly less 
likely to have an occupational accident compared to employees with more external 
attitudes. 

Based upon these results, Hunter (2002) developed an aviation safety locus of 
control (AS-LOC) scale by modifying the Jones and Wuebker (1985) scale so as to 
put all the scale items into an aviation context. Two example items from the AS-LOC 
measuring internality and externality, respectively, are 

• Accidents and injuries occur because pilots do not take enough interest 
in safety. 

• Avoiding accidents is a matter of luck. 

In an evaluation study conducted using 176 pilots who completed the AS-LOC 
over the Internet, Hunter (2002) found a significant correlation (r = -.205; p < .007) 
between internality and involvement in hazardous events (specifically, the hazardous 
event scale described earlier). In contrast, a nonsignificant correlation (r = .077) was 
found between externality and the hazardous event scale score. Consistent with the 
previous research, pilots exhibited a substantially higher internal orientation than 
external orientation on the new scale. 

Similar findings were also reported by Joseph and Ganesh (2006), who admin¬ 
istered the AS-LOC to a sample of 101 Indian pilots. As in the previous research, 
the Indian pilots also had significantly higher internal than external LOC scores. 
An interesting finding of this study is that the civil pilots had higher internal LOC 
scores than the military pilots. Additionally, the transport pilots had the highest 
internal scores of any of the pilot groups, followed by fighter pilots and helicop¬ 
ter pilots. Given the differences in accident rates among these groups, it would 
be interesting to investigate the degree to which these differences in AS-LOC 
scores are attributable to training, formal selection processes, or some sort of self¬ 
selection process. 
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8.11 RISK PERCEPTION AND RISK TOLERANCE 

Risk assessment and management make up one component of the broader process of 
pilot decision making. As noted earlier, poor pilot decision making has been impli¬ 
cated as a leading factor in fatal general aviation accidents (Jensen and Benel 1977), 
and poor risk assessment can contribute significantly to poor decision making. To 
address the question of risk perception among pilots, O’Hare (1990) developed an 
aeronautical risk judgment questionnaire to assess pilots’ perceptions of the risks 
and hazards of general aviation. Hazard awareness was assessed by having pilots 

1. estimate the percentage of accidents attributable to six broad categories; 

2. rank the phases of flight by hazard level; and 

3. rank detailed causes of fatal accidents (e.g., spatial disorientation, misuse of 
flaps). 

O’Hare found that pilots substantially underestimated the risk of general avia¬ 
tion flying relative to other activities and similarly underestimated their likelihood 
of being in an accident. Based on these results, he concluded that “an unrealistic 
assessment of the risks involved may be a factor in leading pilots to ‘press on’ into 
deteriorating weather” (O’Hare 1990, p. 599). 

This conclusion was supported by research (O'Hare and Smitheram 1995; Goh and 
Wiegmann 2001) that shows that pilots who continue flight into adverse weather con¬ 
ditions have a poor perception of the risks. Interestingly, similar results are found in 
studies of youthful drivers (Trankle, Gelau, and Metker 1990), who have significantly 
poorer perceptions of the hazards involved in driving compared to older, safer drivers. 

Risk perception and risk tolerance are related and often confounded constructs. 
Hunter defined risk perception as “the recognition of the risk inherent in a situation” 
(2002, p. 3) and suggested that risk perception may be mediated both by the char¬ 
acteristics of the situation and the characteristics of the pilot experiencing the situa¬ 
tion. Therefore, situations that present a high level of risk for one person may present 
only low risk for another. For example, the presence of clouds and low visibility may 
present a very high risk for a pilot qualified to fly only under visual meteorologi¬ 
cal conditions (VMC), but the same conditions would present very little risk for an 
experienced pilot qualified to fly in instrument meteorological conditions (IMC) in 
an appropriately equipped aircraft. 

The pilot must therefore accurately perceive not only the external situation, but 
also his or her personal capacities. Underestimation of the external situation and 
overestimation of personal capacity lead to a misperception of the risk and are fre¬ 
quently seen as a factor in aircraft accidents. Risk perception may therefore be con¬ 
ceived as primarily a cognitive activity involving the accurate appraisal of external 
and internal states. 

By contrast. Hunter defined risk tolerance as “the amount of risk that an individ¬ 
ual is willing to accept in the pursuit of some goal” (2002, p. 3). Risk tolerance may 
be affected by the person’s general tendency to risk aversion as well as the personal 
value attached to the goal of a particular situation. In flying, just as in everyday life, 
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some goals are more important than others; the more important the goal is, the more 
risk a person may be willing to accept. 

Noting that previous studies had assessed pilots’ estimates of global risk levels 
for broad categories (e.g., pilot, weather, etc.) and drawing upon the extensive driver 
research. Hunter (2002, 2006) proposed that more specific measures of risk percep¬ 
tion and risk tolerance were needed. These new measures would operate at a tactical 
level involving specific aviation situations, as opposed to the strategic level measures 
used previously. Using this approach, Hunter developed two measures of risk percep¬ 
tion and three measures of risk tolerance. 

The risk perception measures included one measure (risk perception—self), that 
asked pilots about the risk they personally would experience in a set of situations and 
another measure (risk perception—other) that asked pilots about the risk that some 
other pilot would experience in another set of situations. Examples of both types of 
measures follow: 

• risk perception—other 

• Low ceilings obscure the tops of the mountains, but the pilot thinks that 
he can see through the pass to clear sky on the other side of the moun¬ 
tain ridges. He starts up the wide valley that gradually gets narrower. 

As he approaches the pass he notices that he occasionally loses sight of 
the blue sky on the other side. He drops down closer to the road leading 
through the pass and presses on. As he goes through the pass, the ceil¬ 
ing continues to drop and he finds himself suddenly in the clouds. He 
holds his heading and altitude and hopes for the best. 

• The pilot is in a hurry to get going and does not carefully check his 
seat, seat belt, and shoulder harness. When he rotates, the seat moves 
backward on its tracks. As it slides backward, the pilot pulls back on 
the control yoke, sending the nose of the aircraft upward. As the air¬ 
speed begins to decay, he strains forward to push the yoke back to a 
neutral position. 

• Just after takeoff, a pilot hears a banging noise on the passenger side of 
the aircraft. He looks over at the passenger seat and finds that he can¬ 
not locate one end of the seatbelt. He trims the aircraft for level flight, 
releases the controls, and tries to open the door to retrieve the seatbelt. 

• risk perception—self 

• At night, fly from your local airport to another airport about 150 miles 
away, in a well-maintained aircraft, when the weather is marginal VFR 
(3 miles visibility and 2,000 feet overcast). 

• Fly in clear air at 6,500 feet between two thunderstorms about 25 
miles apart. 

• Make a traffic pattern so that you end up turning for final with about a 
45° bank. 

In the case of the risk perception—other scale, pilots were asked to rate the risk 
for a third-party, low-time general aviation pilot, using a scale from 1 (very low risk) 
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to 100 (very high risk). For the risk perception—self scale, pilots were asked to rate 
the risk if they personally were to perform this tomorrow (also using the 1-100 rat¬ 
ing scale). 

To measure risk tolerance. Hunter (2002, 2006) created three variants of a risky 
gamble scenario, each in an aviation setting. One variant involved making repeated 
flights in aircraft with a known likelihood of mechanical failure; the other two 
involved flights between thunderstorms of varying distance and through mountain¬ 
ous areas with deteriorating weather. All three variants were structured so that the 
participants could gain points (and potential prizes) by accepting the risk and taking 
a flight; however, if they failed to complete a flight (i.e., crashed), they would lose 
points. This manipulation was intended to provide a motivation to complete flights 
while at the same time encouraging some degree of caution because crashes could 
result in the loss of real prizes. 

These measures and several others were administered to a large sample of pilots 
over the Internet. In general, support was found for the risk perception scales in terms 
of their correlations with pilot involvement in hazardous aviation events. Pilots who 
experienced more hazardous events tended not to have rated the scenarios as risky, 
compared to pilots with fewer hazardous events. However, the measures of risk toler¬ 
ance were not significantly correlated with hazardous aviation events. This led Hunter 
to conclude that poor perception of risks was a more important predictor of hazardous 
aviation events and, by extension, of aviation accident than was risk tolerance. 

8.12 SITUATION AWARENESS 

In common terms, situation awareness (SA) means knowing what is going on around 
one. For a pilot, this means knowing where the aircraft is with respect to other air¬ 
craft in the vicinity, important objects on the ground (e.g., runways, mountains, tall 
radio towers), and weather elements such as clouds, rain, and areas of turbulence. In 
addition, SA means knowing what the aircraft is doing at all times, both externally 
(e.g., turning, descending) and internally (e.g., fuel status, oil pressure). (Students 
of Eastern philosophy may recognize this Zen-like state as mindfulness, or being 
“one with the moment.”) Moreover, SA has both a present and a future component. 
Thus, having good SA means that a pilot knows all the things going on “right now” 
and can reliably estimate what will be happening a few minutes or a few hours from 
now. This distinction is reflected in the Endsley’s proposed definition of SA as “the 
perception of the elements in the environment within a volume of time and space, the 
comprehension of their meaning and the projection of their status in the near future” 
(1988, p. 87). 

Situation awareness is particularly important in the framework of Klein's rec¬ 
ognition-primed decision (RPD) model that emphasizes the importance of SA 
(Kaempf et al. 1996; Klein 1993). Klein’s RPD model suggests that pilots actually 
perform little real problem solving. Rather, the major activity is recognizing a situ¬ 
ation and then selecting one of the limited number of solutions that have worked in 
the past. Clearly, in such a model, awareness of surroundings is very important for 
the detection of changes in the environment that may be used as part of the recogni¬ 
tion process. 
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Within the RPD theoretical framework, experience is critical because it builds 
the repertoire by which one may accurately identify the salient cues and correctly 
diagnose the situation. According to Klein (2000, p. 174), 

The most common reason for poor decisions is a lack of experience. It takes a high 
degree of experience to recognize situations as typical. It takes a high degree of experi¬ 
ence to build stories to diagnose problems and to mentally simulate a course of action. 

It takes a high degree of experience to prioritize cues, so workload won’t get too high. 

It takes a high degree of experience to develop expectancies and to identify plausible 
goals in a situation. 

Although SA is an intriguing construct, it could be argued that it is simply a meta¬ 
construct, incorporating some or all of the other, more basic constructs previously 
discussed. For example, from Endsley’s (1988) definition, SA would subsume the risk 
perception elements discussed earlier because proper detection and evaluation of the 
cues associated with, for example, deteriorating weather conditions would fall within 
the definition of SA. The same argument could be made regarding self-knowledge of 
internal states such as attitudes. The question, therefore, is whether SA is something 
more than the sum of the constituent parts. Is it simply another category to which 
behavior and accidents may be consigned, without delving into an understanding of 
why they occur? As noted earlier, describing is not the same thing as explaining. The 
present authors suggest that the latter description is more accurate, but the interested 
reader may wish to consult the literature. (The book by Endsley and Garland [2000] 
is a good source.) 

8.13 AVIATION WEATHER ENCOUNTERS 

Over the years, encounters with adverse weather have remained one of the largest 
single causes of fatal general aviation accidents. Particularly interesting are those 
instances in which the pilot continued a flight from visual to instrument conditions 
and subsequently lost control of the aircraft or struck the ground while trying to exit 
the weather. Several researchers have examined these accidents from a variety of 
perspectives. One such perspective is to focus on these events as a plan continuation 
error (Orasanu, Martin, and Davison 2001). This perspective suggests that pilots 
fail to alter their plans when unforeseen conditions are encountered that make the 
original plan untenable. This failure can be attributed to the risk perception and risk 
tolerance constructs suggested by Hunter (2006). 

It can also be interpreted in terms of sunk costs (O’Hare and Smitheram 1995). The 
sunk-cost concept attempts to explain plan continuation error as arising from the inher¬ 
ent desire of the pilot not to waste previous efforts. That is, once a trip has been initiated, 
each minute of the trip represents the expenditure of resources (time and money) that will 
be lost if the pilot is forced to return to the origination point without completing the flight. 
Early in a flight, this potential loss is relatively small, but as the duration of the flight 
grows, so does the potential loss. This potential loss (the “sunk cost” in accounting terms) 
then represents a motivation to continue the flight, even into marginal conditions. 
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O’Hare and Owen (1999) tested this concept by having pilots fly a simulated 
cross-country flight in which they encountered adverse weather either early or late 
in the flight. The sunk costs concept would predict that the pilots who encountered 
the weather later in the flight would be more likely to press on in an attempt to 
reach their destination. However, in this experiment, the results failed to support 
that hypothesis: A majority of pilots in both conditions diverted their flights. Thus, 
the validity of this concept as an explanation for pilot behavior in the face of adverse 
weather is questionable. 

In a different approach to understanding pilot behavior with respect to weather. 
Hunter, Martinussen, and Wiggins (2003) used a mathematical modeling technique 
to examine the manner in which pilots combined information about visibility, cloud 
ceiling, precipitation, and terrain to make judgments about the safety of a flight. 
In this study, pilots in the United States, Norway, and Australia were given three 
maps depicting flights in their respective countries. One map depicted a flight over 
level terrain, and another showed a flight over mountainous terrain. The final map 
depicted a flight over a large body of water. 

A scenario-based judgment task in which a safety rating was provided for each 
of 27 weather scenarios for each of the three routes was then completed by 326 
American, 104 Norwegian, and 51 Australian pilots. The 27 weather scenarios were 
based on combinations of varying levels of visibility, ceiling, and precipitation. 
These safety ratings were then used to develop individual regression equations for 
each pilot. (For a discussion of regression equations, see Chapter 2 on statistics.) The 
regression equation for a pilot described the information combination process that he 
or she used to assign the safety ratings. 

Two interesting results were observed. First, the safety ratings for the 27 scenarios 
were very similar for the three diverse groups of pilots. Second, for each group, the 
compensatory model of information use was favored over noncompensatory models. 
The use of a compensatory weather model means that a pilot might decide that con¬ 
ditions are suitable for flight when the ceiling is high (a safe situation) but the vis¬ 
ibility is low (an unsafe situation) because the high ceiling compensates for the low 
visibility in the overall evaluation of the situation. 

In contrast, in a typical noncompensatory model (referred to as the multiple-hur¬ 
dle model), each aspect of the situation is individually examined and compared to 
a criterion. A decision to initiate a flight is made only if all the factors individually 
meet their respective criteria. Here, a high value on one variable cannot compensate 
for a low value on another variable. Hunter et al. argue that using a compensatory 
decision model puts inexperienced pilots at greater risk of being in an accident. 

Overconfidence by pilots was investigated by Goh and Wiegmann (2001), who 
found that pilots who continued into weather conditions in a simulated flight 
reported greater confidence in their piloting abilities, even though there were no 
differences in training or experience when compared to pilots who chose to divert. 
These same pilots also judged weather and pilot error as less likely threats to flight 
safety than the pilots who diverted and they believed themselves less vulnerable 
to pilot error. 
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8.14 OTHER PROGRAMS TO IMPROVE SAFETY 

Under the sponsorship of the FAA, researchers at Ohio State University began a 
program of research in the early 1990s aimed at developing better understanding of 
the causes of accidents among general aviation pilots, with the explicit goal of devel¬ 
oping interventions to improve safety. The approach of this research was focused 
more on the development of expertise among relatively inexperienced pilots than on 
assessing hazardous thoughts or providing heuristics for decision making (Kochan 
et al. 1997). This work led to the development of three training products aimed at 
improving decision making by pilots (1) during the preflight planning process, (2) 
when making decisions in flight, and (3) when making weather-related decisions. 

The first of these three products trained pilots to recognize the hazards present 
in flights and to establish a set of minimum operating standards (usually termed 
“personal minimums”) that would create a buffer against those hazards (Kirkbride 
et al. 1996). For example, although it is legal to fly at night in the United States with 
4 miles’ visibility and a ceiling of 2,000 feet, a prudent pilot lacking an instrument 
rating might elect to fly at night only when the visibility is greater than 8 miles and 
the ceiling is over 5,000 feet. These more stringent standards become that pilot’s 
personal minimums and are recorded in a personal checklist that pilots are encour¬ 
aged to review before each flight. Evaluations of pilot acceptance of this new train¬ 
ing were positive (Jensen, Guilkey, and Hunter 1998), although no evaluation was 
conducted of the impact of the training on external criteria such as involvement in 
hazardous events or accidents. 

In contrast to the attempt at procedural standardization incorporated in the per¬ 
sonal minimums training and the hazardous thoughts training contained in the 
several FAA publications, a skills-based approach has been proposed that would 
focus on helping pilots improve their skill at recognizing and dealing with hazard¬ 
ous situations. O’Hare and colleagues (1998) utilized the techniques of cognitive 
task analysis (CTA) and the critical decision method (CDM) form of CTA described 
by Klein, Calderwood, and MacGregor (1989) to evaluate the decision processes of 
highly experienced general aviation pilots in adverse weather situations. Use of this 
technique allowed them to identify the information cues and processes used by these 
expert pilots in making weather-related decisions. Using these data, Wiggins and 
O’Hare (1993, 2003a), under contract to the FAA, constructed a training program 
they called WeatherWise. 

WeatherWise is a computer-based training program “designed to provide visual 
pilots with the skills necessary to recognize and respond to the cues associated with 
deteriorating weather conditions during flight” (Wiggins and O’Hare 2003b, p. 337). 
The program consists of four stages: 

• Stage 1. An assessment is made of in-flight weather conditions from still 
images to demonstrate the difficulty in making determinations of visual 
flight conditions. 

• Stage 2. An introduction is given to the salient weather cues identified in 
previous research as being used by experts to make weather decisions. 
These cues were 
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• cloud base 

• visibility 

• cloud coloring 

• cloud density 

• terrain clearance 

• rain 

• horizon 

• cloud type 

• wind direction 

• wind speed 

• Stage 3. A number of images of in-flight weather conditions are presented 
to identify the point at which a significant deterioration had taken place. 
During this stage, a rule of thumb was advocated to the effect that, if a 
significant deterioration occurred in three or more of the cues, a weather- 
related decision (possibly a diversion) should be made. 

• Stage 4. Further practice in attending to the salient weather cues is under¬ 
taken. In this stage, participants view a sequence of in-flight video record¬ 
ings and identify the point at which conditions deteriorated below visual 
flight requirements. 

This training program was evaluated using a group of 66 Australian private 
pilots, none of whom had more than 150 total flight hours. In comparison to the 
control group, who did not receive the training, the pilots who completed the 
WeatherWise training were significantly more likely to make a diversion deci¬ 
sion at or before the optimal point. In contrast, the pilots who did not receive the 
training tended to continue on into the adverse weather conditions (Wiggins and 
O'Hare 2003b). 

In addition to the FAA, other civil aviation authorities have recognized the need to 
improve general aviation safety and have incorporated the previously mentioned train¬ 
ing programs as part of their national safety efforts. The civil aviation authorities of 
Australia and New Zealand have adopted the personal minimums and WeatherWise 
training programs and have distributed the training to their general aviation pilots. 

In recognition of the importance of decision making to accident involvement, the 
FAA, in cooperation with a coalition of aviation industry organizations, formed a 
Joint Safety Analysis Team (JSAT) to examine general aviation ADM and to develop 
a program to improve ADM so as to reduce the number of accidents attributable to 
poor decision making. The JSAT, in turn, chartered an international panel of human 
factors experts to address the technical issues of how poor decision making contrib¬ 
uted to accidents and what might be done to improve aviation safety. That panel’s 
recommendations, listing over 100 specific items, was adopted without change by 
the JSAT and provided to the FAA as part of its final report (Jensen et al. 2003). 
Reflecting a pragmatic approach to applying the current knowledge of accident cau¬ 
sality among general aviation pilots, the panel’s recommendations covered a wide 
range of possible interventions. Some examples include: 
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• Create and disseminate to pilots a weather hazard index that incorporates 
the weather risks into a single graphic or number. 

• Reorganize weather briefings so as to present information related to poten¬ 
tially hazardous conditions as the first and last items given to the pilot. 

• Increase the use of scenario-based questions in the written examination. 

• Include training for certified flight instructors (CFIs) on risk assessment and 
management in instructional operations. 

• Produce a personal minimums checklist training program expressly for use 
by CFIs in setting their instructional practices. 

• Establish a separate weather briefing and counseling line for low-time pilots. 

• Require pitot heat to be applied automatically, whenever the aircraft is 
in flight. 

• Develop displays that depict critical operational variables in lieu of raw, 
unprocessed data (e.g., have fuel indicators that show remaining range or 
endurance, as well as remaining gallons of fuel). 

• Develop and disseminate training that explicitly addresses the issues 
involved in crash survivability, including crash technique, minimizing ver¬ 
tical loads, and planning for crashes (water, cell phone, matches, etc.) even 
on flights over hospitable terrain. 

• Develop role-playing simulations in which pilots can observe modeled 
methods of resisting social pressures and can then practice the methods. 

Regrettably, these interventions have not yet been implemented, even though 
they were accepted by both industry and government regulators. This is a reflection, 
perhaps, of the difficulty of making even well-regarded changes in an established 
bureaucracy and cost-conscious industry. Clearly, it is not enough for researchers to 
find better ways to keep pilots safe. They must also find ways to get their discoveries 
implemented—arguably, the more difficult of the two tasks. 

Nevertheless, some progress is being made in training pilots to be more safety 
conscious. In 2006, the AOPA Air Safety Foundation (ASF) began sending a free 
DVD on decision making to all newly rated private and instrument pilots. The sce¬ 
narios contained on the DVD focus on VFR into instrument conditions and IFR 
decision making—two areas that the ASF has found to be particularly troublesome 
(Aircraft Owners and Pilots Association 2006). 

8.15 SUMMARY 

In this chapter we have examined the issue of safety from the perspective of 
aviation psychology. We have seen that although flying in large commercial air 
carriers is quite safe, the situation is not so comforting in general aviation, where 
the risks of involvement in a fatal aviation accident are somewhat higher than 
being involved in a fatal motor vehicle accident. Curiously, anecdotal evidence 
(from the responses of many general aviation pilots when this topic is raised at 
flight safety seminars) suggests that general aviation pilots are largely unaware 
of this differential risk and generally believe that they are safer when flying than 
when driving their cars. Hence, programs to improve safety often receive little 
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more than lip service because the pilots involved do not really feel that they are 
at risk. 

Scientific research has identified several factors that place pilots at greater risk 
of accident involvement. Among those discussed earlier were feelings of invulner¬ 
ability (hazardous attitudes) and feelings of being a victim of outside forces (locus 
of control), along with issues relating to recognition of the risks inherent to flight. 
From this research, programs have been developed to make pilots aware of these risk 
factors and to train them to recognize the cues that indicate situations of heightened 
risk requiring immediate action on their part. 

The advanced technology formerly found only in air carriers and executive jets is 
now working its way into the general aviation fleet. This technology will make some 
tasks easier (e.g., navigation); however, it will present its own set of unique prob¬ 
lems and will still require pilots to make reasoned judgments about when, where, 
how, and if they should undertake a flight. The influence of pilots’ personalities and 
their skill at acquiring and using information will still be great, even in the aircraft 
of tomorrow. Safety requires a proactive approach to assessing and managing all 
the elements that influence the outcome of a flight, including the most important 
element—the human at the controls. 
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Q Concluding Remarks 

Everyone thinks of changing the world, but no one thinks of changing himself. 

Leo Tolstoy 


9.1 INTRODUCTION 

Notwithstanding Tolstoy’s comment, humans are not easy to change. However, after 
the reader has read this book, we hope that he or she will recognize that, of all the 
parts of the aviation system, the human is the part most often called upon to change. 
Fortunately, one of the most characteristic traits of humans is their adaptability— 
their ability to change their behavior to fit the demands of the situation. There is 
no more dramatic demonstration of this adaptability than pushing forward on the 
controls when the aircraft has stalled and the nose is pointing downward—when all 
one’s instincts call for yanking the controls backward. 

Nevertheless, humans are not infinitely adaptable. The research on the selection 
of pilots demonstrates that some individuals are better suited than others. Research 
on accident involvement also suggests that some individuals are more likely to be in 
an accident than others—perhaps because they failed to adapt their behavior to the 
demands of a novel situation. We hope that our readers are now better aware of how 
the human interacts with the aviation system and has also developed some awareness 
of the limits of their personal capabilities and adaptability. 

By its nature, this book could only scratch the surface of aviation psychology. The 
topics covered by each of the chapters have been the subjects of multiple books and 
journal articles. However, the references and suggested readings provided in each of 
the chapters can lead the interested reader to more in-depth information on the top¬ 
ics. This book, we hope, will have prepared him or her for these readings by provid¬ 
ing a basic knowledge of the terminology, concepts, tools, and methods of inquiry of 
psychology. Building upon that foundation, the reader should now be better able to 
assess reports that purport to show the impact of some new training intervention or 
to appreciate the impact of fairly subtle changes in instrumentation design and layout 
on aircrew performance. 

Although we have focused on aviation operations, much of what we have covered 
is equally applicable to other situations. For example, the task of the pilot has much 
in common with drivers, operators of nuclear power stations, and surgeons. Poorly 
designed work stations and controls can contribute to driving accidents and reactor 
meltdowns as easily as they contribute to aircraft crashes. The principles of design¬ 
ing to meet the capabilities and characteristics of the operators and the tasks they are 
required to perform remain the same. Only the specifics of the setting are changed. 

Along that same line, our focus in this book has been on the pilot; however, 
we recognize that he or she is only one part of an extensive team. The discussions 
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regarding the pilot also apply to the people outside the flight deck, including mainte¬ 
nance personnel, air traffic controllers, dispatchers, and managers of aviation organi¬ 
zations. Therefore, what the reader has learned from this book may also prove useful 
in other settings. 

Psychology is concerned with how individuals are alike as well as with how indi¬ 
viduals differ. Knowing one’s strengths and weaknesses, where one’s tendencies can 
take one, and the limits of one’s personal performance envelope can help in avoiding 
situations where demands exceed capacity to respond. 

In the tables that follow, we provide links to a large number of aviation organiza¬ 
tions, government regulatory agencies, research centers, and other sources of infor¬ 
mation related to aviation psychology or to aviation safety in general. These sites 
provide a broad range of resources applicable to pilots—from the novice to the most 
senior airline captain. We encourage the reader to visit these sites to broaden his or 
her knowledge and acquire new skills. We sincerely hope that readers will use the 
information from these Web sites and from this book to become more competent and 
more self-aware, as well as apply what has been learned to being better, safer pilots. 

9.2 INTERNET RESOURCES FOR PILOTS 

The following tables contain links to the principal aviation regulatory authorities, 
military services, universities, and other entities related to aviation, aviation safety, 
and aviation psychology. Some organizations, such as the FAA, AOPA, CASA, 
and Transport Canada, have many more pages of interest than we have listed here. 
However, starting from the addresses listed in these tables, readers should be able to 
locate almost all the relevant material. 

Readers are cautioned that although all these links were valid as of February 26, 
2009, some of the organizations (particularly the FAA) change the structure of their 
Web sites without notice and without providing a means to find the relocated materi¬ 
als. If that occurs, the reader can try searching for the name of the organization and 
will probably find the new site. 

Be aware that many of the U.S. military sites now have protection systems that 
make them somewhat incompatible with the more popular Internet browsers (e.g., 
Internet Explorer). Because of this, a message may be sent to the effect that there 
are security issues with the site one is trying to reach, and one’s browser may issue a 
prompt to avoid the site. Usually, continuing the operation will result in being taken 
to the site. It just takes a bit more trust and perseverance. 


Civil Aviation Authorities 

Organization 

Page 

Link Address 

International Civil Aviation 

Home page 

http://www.icao.int/ 

Organization 

International Civil Aviation 
Organization 

Training on SMS 

http://www.icao.int/anb/ 

safetymanagement/training/ 

training.html 
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Federal Aviation Administration 

Home page 

http://www.faa.gov 

Federal Aviation Administration 

Aviation manuals and 

http://www.faa.gov/library/ 


handbooks 

manuals/ 

Federal Aviation Administration 

Aviation news 

http://www.faa.gov/news/ 

aviation_news/ 

Federal Aviation Administration 

Aviation safety team 

http://faasafety.gov/ 

Federal Aviation Administration 

Human factors workbench 

http://www.hf.faa.gov/portal/ 
default, aspx 

Federal Aviation Administration 

Aviation maintenance human 

http://www.hf.faa.gov/hfmaint/ 


factors 

Default. aspx ?tabid=27 5 

Transport Canada 

Home page 

http://www.tc.gc.ca 

Transport Canada 

Safety management systems 

http://www.tc.gc.ca/ 

CivilAviation/SMS/menu.htm 

Civil Aviation Safety Authority 

Home page 

http://www.casa.gov.au 

of Australia 

Civil Aviation Authority of New 

Home page 

http://www.caa.govt.nz 

Zealand 

Civil Aviation Authority of the 

Home page 

http://www.caa.co.uk 

United Kingdom 


Accident Investigation Boards 


Country 

Organization 

Link Address 

Australia 

Transport Safety Bureau 

http://www.atsb.gov.au 

Canada 

Transportation Safety Board 

http://www.tsb.gc.ca/ 

Denmark 

Air Accident Investigation 

Board 

http://www.hcl.dk/sw593 .asp 

France 

Bureau Enquetes—Accidents 

http://www.bea-fr.org/anglaise/ 

index.htm 

Germany 

Bundesstelle fur 
Flugunfalluntersuchung 

http://www.bfu-web.de/ 

Ireland 

Air Accident Investigation 

Unit 

http://www.aaiu.ie/ 

Norway 

Aircraft Accident Investigation 

Board 

http://www.aibn.no/default. 

asp?V_ITEM_ID=29 

Sweden 

Board of Accident 

Investigation 

http://www.havkom.se/ 

index-eng.html 

Switzerland 

Aircraft Accident Investigation 

Bureau 

http: //www.bfu. admin. ch/en/ 
index.htm 

The Netherlands 

Transport Safety Board 

http ://w w w. onderzoeksraad. nl / 

en / 

United Kingdom 

Air Accidents Investigation 

Branch 

http://www.aaib.gov.uk/home/ 
index, cfm 

United States 

National Transportation Safety 

Board 

http://www.ntsb.gov 
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Other Civilian Government Agencies 


U.S. Organization 

Page 

Link Address 

NASA 

Aviation Safety Reporting 
System (ASRS) 

http://asrs.arc.nasa.gov/ 

NASA 

Small aircraft transportation 

http://www.nasa.gov/centers/ 


systems 

langley/news/factsheets/S ATS. 

html 

NASA 

Aircraft icing training 

http:// aircrafticing. grc .nasa. go v/ 
courses.html 

NOAA 

Aviation weather center 

http://aviationweather.gov/ 

DOT 

Volpe Research Center 

http://www.volpe.dot.gov/hf/ 
aviation/index. html 

DOT 

Transportation Safety Institute 

http://www.tsi.dot.gov/ 


Military Organizations 

Organization 

Page 

Link Address 

U.S. Navy 

School of Aviation Safety 

https://www.netc.navy.mil/ 

U.S. Army 

Combat Readiness Center 

nasc web/s as/index. htm 

https://safety.army.mil/ 

U.S. Navy 

(safety) 

Air Warfare Center—Training 

http://nawctsd.navair.navy.mil/ 

U.S. Army 

Army Research Institute 

http://www.hqda.army.mil/ari/ 

U.S. Department of Defense 

Human Factors and 

http: //hfetag .com/ 

U.S. Army 

Ergonomics Technical 

Advisory Group 

Human Research and 

http: //w w w. arl. army, mil/ www/ 


Engineering Directorate 

default. 

UK Ministry of Defense 

Human Factors Integration 

cfm? Action=31 &Page=31 
http://www.hfidtc.com/ 


Defense Technology Center 

HFI_DTC_Events.htm 

U.S. Air Force 

Human Effectiveness 

http ://w w w. wpafb. af. mil/afrl/he/ 

U.S. Air Force 

Directorate 

Office of Scientific Research 

http: //www. afosr. af. mil/ 


University Research Centers 


Organization 

Page 

Link Address 

Embry-Riddle Aeronautical 

Prescott Flight Center Web 

http:// flight. pr. erau. edu/links. 

University, Florida 

links 

html 

University of North Dakota 

Resources 

http://www.avit.und.edu/ 



f40_Resources/f2_Podcasts/ 



index.php 

Cranfield University, United 

Aerospace research, including 

http://www.cranfield.ac.uk/ 

Kingdom 

psychology 

aerospace/index.j sp 
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Ohio State University 
Arizona State University 
University of Texas 

University of Troms0, Norway 

George Mason University, 
Virginia 

University of Otago, New 
Zealand 

Trinity College, Dublin 

University of Illinois at 
Urbana-Champaign 
National Aerospace Laboratory 
NLR, The Netherlands 
Maastricht University, The 
Netherlands 

Monash University, Australia 
University of Graz 


Aviation Department 

Cognitive Engineering 
Research Institute 

Human Factors Research 
Project 

University Research Center 

Center for Air Transportation 
Systems Research 

Cognitive Ergonomics and 
Human Decision Making 
Laboratory 

Aerospace Psychology Research 
Group 

Institute of Aviation 

NLR home page 

University Research Center 

Accident Research Center 

International summer school 
on aviation psychology 


http://aviation.eng.ohio-state. 

edu/ 

http://www.cerici.com/ 

http ://homepage .psy. utexas .edu/ 
homepage/group/ 
HelmreichLAB/ 
http://www2.uit.no/www/ 
inenglish 

http://catsr.ite.gmu.edu/ 

http://psy.otago.ac.nz/cogerg/ 

http://www.psychology.tcd.ie/ 

aprg/home.html 

http://www.aviation.uiuc.edu/ 

aviweb/ 

http://www.nlr.nl/ 

http://www.unimaas.n1/http:// 
www.unimaas.nl 
http ://w w w. monash. edu. au/ 
muarc/ 

http://www.uni-graz.at/isap9/ 


Organizations 


Organization 

Page 

Link Address 

AOPA/ASF 

Home page 

http://www.aopa.org/asf/ 

AOPA 

Pilot training 

http://www.aopa.org/asf/ 

online_courses/#new 

AOPA Air Safety Foundation 

Weather training 

http://www.aopa.org/asf/ 

publications/inst_reports2. 

cfm?article=5180 

Flight Safety Foundation 

Aviation Safety Network 

http://aviation-safety.net/index. 

php 

Flight Safety Foundation 

Links 

http://www.flightsafety.org/ 
related/default. cfm 

Experimental Aircraft 

Home page 

http://eaa.org/ 

Association 

National Association of Flight 

Home page 

http://www.nafinet.org/ 

Instructors 

Aviation Safety Connection 

Home page 

http://aviation.org/ 

Austrian Aviation Psychology 

Home page 

http://www.aviation-psychology. 

Association 


at/index.php 



216 


Aviation Psychology and Human Factors 


European Association for 
Aviation Psychology 
UK Royal Aeronautical Society 
APA Division 19 Military 
Psychology 

APA Division 21 Experimental 
Psychology 

Australian Aviation Psychology 
Association 

International Test Commission 

Association for Aviation 
Psychology 


Home page 

Home page 
Home page 

Home page 

Home page 

Standards for psychological 
tests 

Home page 


http://www.eaap.net/ 

http://www.raes-hfg.com/ 

http://www.apa.org/divisions/ 

divl9/ 

http://www.apa.org/divisions/ 

div21/ 

http ://www. aavpa.org/home. htm 

http://www.intestcom.org/ 

http://www.avpsych.org/ 


Other 

Organization 

American Flyers 

International Symposium in 
Aviation Psychology 

U.S. Army Research Laboratory 

International Military Testing 
Association 

American Psychological 
Association 

Norwegian Institute of Aviation 
Medicine 

Office of Aerospace Medicine of 
the FAA 

U.S. Civil Air Patrol 

Neil Krey 

Aviation Weather. Com 

National Weather Association 


International Journal of Applied 
Aviation Studies 


Page 

Provides access to FAA videos 

Meeting held every other year 
devoted to aviation 
psychology 

Helmet-mounted displays in 
helicopters 

Home page 

Information on tests and test 
development standards 

Aeromedical research 

Technical reports produced by 
the FAA on aviation 
psychology 

Safety 

CRM developers’ page 

Aviation weather maps 

Weather courses 


Scientific articles on aviation 
psychology topics 


Link Address 

http ://www. americanflyers .net/ 
Resources/faa_videos. asp 
http://www.wright.edu/isap/ 

http://www.usaarl.army.mil/ 

hmd/cp_0002_contents.htm 

http://www.internationalmta.org/ 

http://www.apa.org/science/ 

testing.html 

http://flymed.no 

http://www.faa.gov/library/ 
reports/medical/ 
oamtechreports/ 
http ://level2. cap. gov/index. 

cfm?nodeID=5182 
http://s92270093.onlinehome.us/ 
CRM-Devel/resources/ 
crmtopic.htm 
http://maps.avnwx.com/ 
http ://www. n was. org/ 
committees/avnwxcourse/ 
index.htm 

http://www.faa.gov/about/ 
office_org/headquarters_ 
offices/arc/programs/academy/ 
journal/ 
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University Corporation for 
Atmospheric Research 
Quantico Flying Club 

Aviation Human Factors 3 

SmartCockpit 

Airbus 

U.S. Department of Defense 


Meteorology education and 
training 

Weather training 

Home page 

Large aircraft safety issues 
Safety library 

Human Systems Information 
Integration Analysis Center 


http://www.meted.ucar.edu/ 

https://www.metocwx. quantico. 
usmc.mil/weather_for_aviators/ 
pilot_tmg.htm 
http://www.avhf.com 
http://www.smartcockpit.com/ 
http://www.airbus.com/en/ 
corporate/ethics/safety_lib/ 
http://iac.dtic.mil/ 


In the spirit of full disclosure, it should be noted that this site is maintained by one of the authors. 


Linkage Sites 



Organization 

Page 

Link Address 

Flight Safety Foundation 

Links to aviation sites 

http://www.flightsafety.org/ 
related/default. cfm 

Landings Web Page 

Aviation safety links 

http://www.landings.eom/_ 
landings/pages/safety.html 

Node Works 

Links to other sites 

http://dir. nodeworks .com/ 
Science/Technology/ 
Aerospace/ Aeronautic s/ 
Safety _of_Aviation/ 

Human Performance 

Center—Spider 

Links to other sites 

http://spider.adlnet.gov/ 


Readers will have noted that all of the preceding sites are English language 
sites. Of course, the accident investigation boards will have parallel native-lan¬ 
guage sites, as do sites such as the NLR. Undoubtedly, many other non-English 
sites are also available; however, for various reasons a majority of the research and 
publications in aviation are in English—hence, the preponderance of Web sites in 
the English language. 
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